REAL  TIME  FAULT  DETECTION  AND  DIAGNOSTICS 
USING  FPGA-BASED  ARCHITECTURES 
THESIS 

Nathan  P.  Naber,  Second  Lieutenant,  USAF 
AFIT/GCE/EN  G/ 1 0-04 

DEPARTMENT  OF  THE  AIR  FORCE 
AIR  UNIVERSITY 

AIR  FORCE  INSTITUTE  OF  TECHNOLOGY 

Wright-Patterson  Air  Force  Base,  Ohio 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


The  views  expressed  in  this  thesis  are  those  of  the  author  and  do  not  reflect  the  official  policy  or 
position  of  the  United  States  Air  Force,  Department  of  Defense,  or  the  United  States 
Government. 


AFIT/GCE/EN  G/ 1 0-04 


REAL  TIME  FAULT  DETECTION  AND  DIAGNOSTICS 
USING  FPGA-BASED  ARCHITECTURES 
THESIS 

Presented  to  the  Faculty 

Department  of  Electrical  and  Computer  Engineering 
Graduate  School  of  Engineering  and  Management 
Air  Force  Institute  of  Technology 
Air  University 

Air  Education  and  Training  Command 
In  Partial  Fulfillment  of  the  Requirements  for  the 
Degree  of  Master  of  Science  in  Computer  Engineering 

Nathan  P.  Naber,  BS 
Second  Lieutenant,  USAF 

March  2010 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


AFIT/GCE/ENG/10-04 


REAL  TIME  FAULT  DETECTION  AND  DIAGNOSTICS 
USING  FPGA-BASED  ARCHITECTURES 

Nathan  P.  Naber,  BS 
Second  Lieutenant,  USAF 


)S_  ^o|o 

Date 


Date 


AFIT/GCE/EN  G/ 1 0-04 


Abstract 

Errors  within  circuits  caused  by  radiation  continue  to  be  an  important  concern  to 
developers.  A  new  methodology  of  real  time  fault  detection  and  diagnostics  utilizing  FPGA 
based  architectures  while  under  radiation  were  investigated  in  this  research.  The  contributions  of 
this  research  are  focused  on  three  areas;  a  full  test  platform  to  evaluate  a  circuit  while  under 
irradiation,  an  algorithm  to  detect  and  diagnose  fault  locations  within  a  circuit,  and  finally  to 
characterize  Triple  Design  Triple  Modular  Redundancy  (TDTMR),  a  new  form  of  TMR.  Five 
different  test  setups,  injected  fault  test,  gamma  radiation  test,  thermal  radiation  test,  optical  laser 
test,  and  optical  flash  test,  were  used  to  assess  the  effectiveness  of  these  three  research  goals. 

The  testing  platform  was  constructed  with  two  FPGA  boards,  the  Device  Under  Test 
(DUT)  and  the  controller  board,  to  generate  and  evaluate  specific  vector  sets  sent  to  the  DUT. 

The  testing  platform  combines  a  myriad  of  testing  and  measuring  equipment  and  work  hours 
onto  one  small  reprogrammable  and  reusable  FPGA.  This  device  was  able  to  be  used  in  multiple 
test  setups.  The  controlling  logic  can  be  interchanged  to  test  multiple  circuit  designs  under 
various  forms  of  radiation. 

The  detection  and  diagnostic  algorithm  was  designed  to  determine  fault  locations  in  real 
time.  The  algorithm  used  for  diagnosing  the  fault  location  uses  inverse  deductive  elimination. 

By  using  test  generation  tools,  fault  lists  were  developed.  The  fault  lists  were  used  to  narrow  \ 
the  possible  fault  locations  within  the  circuit.  The  algorithm  is  able  to  detect  single  stuck  at 
faults  based  on  these  lists.  The  algorithm  can  also  detect  multiple  output  errors  but  not  able  to 
diagnose  multiple  stuck  at  faults  in  real  time. 
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TDTMR  utilized  three  unique  forms  of  logic  rather  than  having  three  copies  of  identical 
circuitry.  The  three  different  adder  designs  used  for  this  research  are  a  behavioral  adder,  carry 
look  ahead  adder,  and  ripple  carry  adder. 

Based  on  the  five  tests,  the  testing  platform  operated  successfully.  The  detection  and 
diagnosis  algorithm  was  able  to  detect  errors.  The  injected  fault  test  was  the  only  test  that  was 
able  to  properly  diagnosis  the  location  of  the  fault.  The  results  also  unexpectedly  showed  that 
the  voting  unit  failed  before  any  of  the  adders  while  under  radiation.  Dose  rate  versus  total  dose 
has  a  differing  effect  on  the  DUT.  The  goals  of  this  research  was  met  by  completing  a  fully 
interchangeable  and  operational  testing  platform,  an  algorithm  that  detects  and  diagnosis  errors 
in  real  time,  and  an  initial  evaluation  of  TDTMR. 
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REAL  TIME  FAULT  DETECTION  AND  DIAGNOSTICS 
USING  FPGA-BASED  ARCHITECTURES 

I.  Introduction 

Electronics  and  technology  continue  to  dominate  the  market  and  play  an  indispensible 
role  in  space  exploration.  Space,  while  seemingly  benign,  is  both  volatile  and  chaotic.  To  study 
such  a  vast,  unknown  area,  man  has  launched  satellites,  space  shuttles,  and  telescopes  out  into 
space  to  gather  more  information  about  this  unfamiliar  place.  These  machines  and  devices 
contain  a  vast  array  of  electronics.  Terrestrial  electronics  are  largely  protected  from  the  effects 
of  radiation,  not  so  in  space. 

Circuits  and  transistors  endure  a  lot  of  stress  in  the  harsh  space  environment.  The  study 
of  electronics  in  this  environment  is  crucial  for  their  reliable  operation.  The  effects  of  radiation 
on  electronics  are  a  principle  concern.  The  current  technique  for  protecting  electronics  in  space 
is  to  make  them  “Radiation  Hardened”.  While  this  brute  force  method  is  effective,  a  cheaper, 
more  effective  means  of  combating  radiation  effects  is  always  desirable. 

A  Field-Programmable  Gate  Array  (FPGAs)  is  an  integrated  circuit  with  the  versatility  to 
be  reprogrammed  for  multiple  applications.  A  design  utilizing  FPGAs  offers  both  cost  cutting 
and  time-saving  advantages  over  a  design  utilizing  a  conventional  Application-Specific 
Integrated  Circuit  (ASIC)  [23].  Thus,  FPGAs  are  increasingly  being  considered  for  various  new 
device  applications  throughout  the  commercial  business  world. 

1.1  Motivation 

Newer  technologies  are  increasingly  being  developed  on  FPGAs  due  to  their  low  costs 
and  increased  performance  results  over  traditional  ASIC  devices.  Space  radiation  has  the 
potential  of  producing  errors  at  the  transistor  level  of  electrical  devices  [1],  The  current  methods 
to  combat  these  conditions  and  minimize  the  overall  damage  due  to  errors  are  to  use  radiation 
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hardened  ASIC  or  programmable  logic  devices  found  within  FPGAs.  Radiation  hardened 
devices  are  not  100%  effective  and  they  are  expensive  [3].  Commercial  FPGAs  have  been 
investigated  as  a  cheaper  means  of  electrical  devices  in  space;  however,  they  are  not  radiation 
hardened  [2,  3]. 

In  an  effort  to  combat  this  problem,  computer  models  and  real  simulations  have  been 
used  to  demonstrate  the  impacts  of  errors  through  various  forms  of  radiation  on  specific 
components  of  commercial  FPGAs  in  addition  to  the  FPGA  itself.  It  is  still  difficult  to 
characterize  the  effects  of  radiation  on  electronics.  An  FPGA  offers  the  versatility  in  creating  a 
platform  to  characterize  and  evaluate  the  damaging  effects  of  radiation.  A  test  platform  cuts  the 
cost  of  having  all  the  necessary  measuring  tools  to  evaluate  a  circuit. 

An  issue  that  has  not  been  carefully  investigated  is  attempting  to  locate  and  diagnose 
specific  faults  within  a  circuit  in  real  time.  Testing  circuits  is  a  challenging  undertaking  and 
attempting  to  diagnose  the  location  of  error  presents  an  even  greater  challenge.  Obtaining  the 
ability  to  locate  errors  in  real  time  can  aid  in  the  understanding  and  classification  of  radiation 
effects.  A  better  understanding  of  these  effects  can  lead  to  better  counter  measures  and  error 
prevention  techniques  for  protecting  circuits  from  these  errors. 

Design  logic  and  hardened  by  design  both  have  been  used  in  an  effort  to  correct  errors 
while  in  a  space  environment.  Various  methods  have  been  proven  more  effective  than  others  and 
Triple  Modular  Redundancy  (TMR)  has  shown  to  be  an  effective  fault  redundancy  method  in 
error  correction  [3-6].  Further  investigation  in  improving  TMR  can  demonstrate  the  importance 
of  fault  redundancy  in  integrated  circuits. 

1.2  Scope 

Research  in  this  thesis  will  focus  on  the  continuation  of  the  topic  of  radiation  effects  on 
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electronics.  Several  forms  of  radiation  are  used  in  this  research:  a  60Co  gamma  cell  (providing 
ionizing  radiation  only),  thermal  radiation,  and  optical  radiation.  The  two  FPGA  boards  chosen 
in  this  research  are  the  Virtex-II-Pro  and  the  FX12  mini  module  series  both  manufactured  by 
Xilinx.  The  testing  platform  will  enable  researchers  to  utilize  it  in  other  forms  of  radiation  not 
employed  in  this  thesis. 

1.3  Contributions 

The  overarching  goal  of  this  research  was  to  characterize  the  effects  of  different  radiation 
forms  on  integrated  circuits.  The  research  facilitates  the  potential  replacement  of  physically 
hardened  ASIC  and  FPGA  devices,  as  well  as  allow  for  improvements  in  designs  of  non¬ 
radiation  hardened  electronics. 

A  test  platform  was  developed  in  an  effort  to  establish  a  base  system  that  has  the  potential 
to  be  used  in  various  forms  of  radiation  with  multiple  circuit  designs.  The  test  platform  utilizes  a 
bridge  board  with  ribbon  cables  to  perform  proper  communication  between  the  two  FPGAs. 

This  design  offered  significant  improvement  from  the  previous  Ethernet  cable  design  [17].  The 
testing  platform  provides  the  ability  to  generate  a  number  of  input  vectors,  monitor  the  values 
that  would  be  sent  to  the  DUT,  and  analyze  the  resulting  data. 

Along  with  this  testing  platform,  an  algorithm  was  developed  to  diagnose  and  locate  the 
faults  within  the  circuit.  Upon  uncovering  an  error  within  the  circuit,  the  device  locates  which 
circuit  design’s  output  was  faulty  and  switches  to  a  real-time  diagnostic  mode.  The  circuit 
undergoes  a  series  of  test  and  diagnostic  vectors  designed  to  pin  point  the  exact  stuck  at  fault 
location  within  the  integrated  circuit  with  the  best  resolution  possible  without  physically 
destroying  the  chip.  The  testing  platform  helps  validate  the  diagnosis  algorithm  on  the  FPGA 
while  under  various  radiation  environments. 
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Finally,  the  research  performed  in  this  thesis  attempts  to  demonstrate  whether  design¬ 
hardening  techniques  can  reduce  system  vulnerability  to  external  errors.  A  new  architecture 
design  and  programming  model  was  implemented  to  increase  detection,  correction  and  tolerance 
of  failures  furthering  the  potential  uses  for  FPGAs  in  the  callous  space  radiation  environments. 
Triple  Design  TMR  (TDTMR)  was  implemented  into  the  logic  in  an  effort  to  correct  errors 
without  causing  downtime  in  the  circuit.  Instead  of  the  traditional  three  copies  of  the  same 
circuit  design  in  TMR,  a  unique  approach  of  having  different  design  implementations  of  the 
same  circuit  was  used.  This  new  method  demonstrates  that  different  design  logic  could  be  more 
robust  under  the  effects  of  radiation  instead  of  having  the  same  copy  of  the  design  three  times. 
This  implementation  hopes  to  improve  the  ability  to  correct  these  errors  without  needing  to 
reprogram  the  entire  microelectronic  device. 

The  contributions  of  this  work  include  an  analysis  of  commercial  off  the  shelf  (COTS) 
reconfigurable  electronics  in  radiation  environments  over  the  current  use  of  radiation  hardened 
devices.  Overall,  the  main  contributions  of  this  work  are  as  follows: 

1 .  Design  a  test  platform  utilizing  a  FPGA  module  that  could  send,  receive,  and  process 
data  while  under  radiation. 

2.  Real-time  diagnostics  to  uncover  an  error  and  pinpoint  its  location  with  the  best 
resolution  in  the  quickest  manner. 

3.  Utilizing  design  hardening  techniques  in  a  commercial  FPGA  with  TDTMR. 

1.4  Thesis  Organization 

The  work  performed  on  this  thesis  can  be  broken  up  to  five  main  sections.  Following  this 
introduction  is  background  information  on  the  topic  and  its  related  sources.  The  background 
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covers  radiation  effects  on  electronics,  current  TMR  methods  used  with  circuit  design,  and 
testing  platforms. 

Chapter  3  discusses  the  methodology  used  to  formulate  an  optimal  solution  for  the 
research  work  performed  on  this  topic.  It  also  further  discusses  the  design  choices  chosen  for  the 
particular  implementation  used  to  complete  the  project.  Chapter  4  covers  the  results  obtained 
through  the  methodology  and  a  critical  analysis  of  them.  Finally,  chapter  5  serves  to  summarize 
and  conclude  the  relevant  work  achieved  through  this  thesis.  It  further  discusses  possible  future 
research  within  this  area. 
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II.  Background 


2.1  Chapter  Overview 

This  section  describes  key  areas  of  background  information  related  to  the  topic  of  this 
thesis.  Information  on  FPGAs,  TMR,  Fault  detection  and  correction,  radiation  effects  on 
electronics,  and  radiation  sources  are  covered  in  this  chapter.  A  set  of  improvements  made  on 
TMR  are  also  described.  There  are  many  forms  of  radiation  effects  on  electronics  but  Total 
Dose  Effect,  Single  Event  Effects,  and  Single  Event  Upsets  are  only  explained.  The  gamma  and 
thermal  radiation  sources  are  also  described  in  this  chapter. 

2.2  Field  Programmable  Gate  Arrays  (FPGA) 

Field  Programmable  Gate  Arrays  (FPGAs)  are  increasingly  demanded  by  circuit 
designers  from  all  fields  due  to  their  high  flexibility  to  meet  multiple  requirements  such  as  high 
performance,  low  costs,  and  the  capability  of  on  the  fly  reprogramming.  FPGAs  have  been 
known  to  be  slower,  less  energy  efficient,  and  generally  achieve  less  functionality  than  their 
fixed  ASIC  counterparts.  However,  the  decreasing  costs  and  development  time  needed  to 
implement  FPGAs  compared  to  designs  using  discrete  logic  devices  have  made  programmable 
logic  devices  favorable  in  space  and  avionic  applications  as  well  [7].  These  forms  of  integrated 
circuits  contain  an  array  of  Configurable  Logic  Blocks  (CLBs)  and  programmable  interconnects 
in  the  circuit  that  allows  the  connection  of  different  gates  and  structures.  CLBs  are  made  of 
basic  elements  which  include  look-up  tables,  multiplexers,  and  flip  flops  along  with  routing 
logic,  pass  transistors,  and  I/O  pads.  Each  CLB  can  implement  any  Boolean  function  of  its 
inputs  and  can  be  linked  together  via  routing  blocks  to  implement  more  complex  logic.  The 
CLBs  are  interconnected  through  a  general  routing  matrix  that  comprises  an  array  of  routing 
switches  located  at  the  intersections  of  horizontal  and  vertical  routing  channels  [8]. 
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FPGA  devices  have  been  used  in  space  for  more  than  a  decade  with  a  mixed  level  of 
success;  however,  recently  few  reprogrammable  devices  have  been  used  on  spacecraft  due  to 
their  sensitivity  to  involuntary  reconfiguration  due  to  Single  Event  Upsets  (SEU)  [4],  Space 
electronic  designers  are  now  more  willing  to  utilize  FPGAs  in  high  radiation  environments  in 
place  of  radiation  hardened  devices.  They  perform  well  in  high  throughput  signal  processing 
applications  often  used  in  space.  With  the  rising  costs  of  radiation  hardened  devices,  research 
into  utilizing  FPGAs  for  hardened  by  design  testing  has  grown  significantly. 

As  previously  mentioned,  despite  the  growth  of  FPGA  development,  they  still  remain 
susceptible  to  radiation  errors.  Since  FPGAs  store  their  programming  data,  or  configuration  in 
an  SRAM-like  configuration  memory,  radiation  can  actually  alter  the  intended  circuit  [6].  The 
static  memory  elements  and  combinatorial  logic  paths  are  susceptible  to  upset  from  heavy  ion 
particles  within  the  space  environment.  Protection  of  the  combinational  logic  is  therefore 
required  to  avoid  involuntary  changes  of  functionality.  It  has  been  important  to  develop  some 
form  of  mitigation  techniques  to  account  for  these  errors  to  ensure  reliable  operation. 

It  is  important  to  note  that  even  though  the  growing  use  of  FPGAs  in  space  and  radiation 
environments  are  rapidly  expanding,  radiation  induced  errors  provide  a  significant  hindrance  on 
performance.  New  strategies  have  been  implemented  to  help  foster  the  growing  field  of  FPGAs 
in  the  space  environment  through  hardened  by  design  rather  than  radiation  hardened.  These 
mitigation  strategies  of  correcting  the  effects  of  radiation  errors  lead  the  way  for  this  particular 
research  along  with  other  research  in  this  field. 

2.3  Triple  Modular  Redundancy  (TMR) 

Recently,  several  TMR  methods  have  been  introduced  in  an  effort  to  reliably  combat  the 
persistent  problem  of  errors  in  integrated  circuits.  Some  of  the  methods  have  been  more 
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promising  than  others.  The  method  chosen  for  this  study  of  TMR  was  the  technique  of  design 
hardening  also  categorized  as  fault  redundancy  and  correction. 

TMR  has  been  widely  used  as  a  form  of  design  hardening  to  greatly  improve  the 
reliability  of  FPGA  designs  to  mitigate  an  upset  as  it  occurs  in  the  device  configuration.  TMR 
has  been  shown  to  greatly  improve  the  reliability  of  FPGA  designs  subject  to  SEUs  [2],  This 
mitigation  technique  traditionally  uses  three  identical  copies  of  a  circuit  which  would  run  in 
parallel.  The  outputs  would  then  go  through  a  majority  voter  circuit.  If  there  would  be  an  error 
on  one  of  the  bits  in  one  circuit,  the  TMR  votes  out  the  error.  To  improve  the  basic  concept  of 
TMR,  a  few  design  enhancement  steps  were  made  on  the  simple  three  circuit  and  voter  layout 
(Figure  1). 


Figure  1  -  Basic  TMR 

The  first  addition  was  to  expand  the  one  bit  voter  layout  shown  in  Figure  2  and  triplicate 
the  voter  unit  so  there  would  no  longer  be  a  single  point  of  failure  (Figure  3).  This  application 
significantly  reduces  the  configuration  sensitivity  of  the  design  [2] . 
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Figure  2  -  TMR  Circuit  Diagram 
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[  Logic  3 

_ 

Figure  3  -  TMR  with  Triplicated  Voter 

It  was  shown  that  these  two  designs  suffer  from  resynchronization  problems  [3]  meaning  if  a 
faulty  bit  is  ‘repaired’  in  one  of  the  voting  logic,  that  bit  would  not  be  synchronized  with  the 
other  two  copies  of  logic.  By  needing  to  correct  the  bit,  a  slight  delay  is  introduced  which 
compounded  over  time  may  cause  a  synchronization  issue.  This  problem  can  be  prevented  by 
placing  the  voting  circuitry  within  the  feedback  path  of  the  circuit  [3],  The  simple  addition 
helped  prevent  synchronization  problems  and  increased  the  reliability  of  the  circuit  (Figure  4). 
Each  of  these  designs  was  tested  on  their  reliability  with  preventing  SEUs.  Table  1  depicts  the 
results  from  a  series  of  tests  on  four  different  TMR  designs. 
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Table  1  -  Evaluation  Results  of  TMR  Designs 


Design 

Simple  Incrementer 

Up/Down  Loadable  Counter 

(single  clock) 

LUTs  Failures  Speed(MHz) 

LUTs  Failures  Speed  (MHz) 

No  Redundancy 

8 

446 

220 

10 

463 

220 

1  Voter 

35(~4x) 

410 

217(99%) 

EBSBI 

484 

217(99%) 

3  Voters 

51(~6x) 

14 

199(91%) 

MB 

14 

213(97%) 

Feedback 

51(~6x) 

14 

160(73% 

sm 

15 

157(72%) 

Map  Feedback 

27(~3x) 

15 

194(88%) 

N/A 

LJ  Logic  1 
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Figure  4  -  TMR  Design  with  Feedback 

Some  recent  design  enchantments  with  TMR  have  been  developed.  There  were  three 
new  TMR  techniques  that  were  explored,  Functional  Triple  Modular  Redundancy  (FTMR), 
Selective  Triple  Modular  Redundancy  (STMR),  and  Partial  Triple  Modular  Redundancy.  FTMR 
shows  that  both  sequential  and  combinational  blocks  can  be  protected  by  means  of  TMR  [4], 
SMTR  extends  the  basic  TMR  technique  by  identifying  “sensitive”  gates  in  the  circuit  and  then 
introduces  TMR  selectively  at  those  gates  [5],  Finally,  Partial  TMR  extends  STMR  a  bit  but 
gives  priority  to  the  circuit  components  which  are  more  susceptible  to  persistent  errors  and 
applies  TMR  to  them  [6], 
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Each  of  these  methods  offers  a  more  enhanced  version  of  TMR.  The  current  use  of 


TDTMR  in  this  research  utilized  a  few  of  the  enhancements,  but  the  main  difference  was  using 
three  different  logic  designs  instead  of  the  three  copies. 

2.4  Fault  Detection  and  Correction 

Fault  Detection  and  correction  has  always  been  a  crucial  aspect  when  it  comes  to  digital 
circuits.  With  a  countless  number  of  logical  gates  that  continue  to  increase,  it  has  become 
virtually  impossible  to  fully  test  every  possible  input  combination  for  circuit  designs  for  testing 
purposes.  A  defect  is  an  error  introduced  into  a  device  during  the  manufacturing  process.  A 
fault  is  said  to  be  detected  if  a  specific  test  pattern  used  with  the  primary  inputs  could  detect  the 
specific  fault  and  contain  differing  primary  output  results  from  the  original  design. 

High  level  fault  modeling  provides  the  ability  to  use  simulation  based  design  verification. 
Bridging  faults,  delay  faults,  stuck-at  faults  are  the  most  popular  fault  models  in  digital  testing  at 
this  level  [9].  The  single  ‘stuck-at’  fault  model  has  been  the  most  versatile  fault  model  for 
testing  circuit  logic.  A  stuck-at  fault  is  assumed  to  affect  only  the  interconnection  between  gates. 
It  shows  if  a  circuit  has  ‘n’  signal  lines,  then  there  are  potentially  ‘2n’  ‘stuck-at’  faults  within  the 
circuit.  The  goal  would  be  to  find  a  test  pattern  that  could  detect  all  possible  faults  for  each 
circuit  design.  Automatic  Test  Pattern  Generation  (ATPG)  was  developed  in  an  effort  to  find 
manufacturing  defects  along  with  finding  a  small  number  of  test  patterns  that  identify  a  high 
number  of  possible  faults. 

ATPG  is  a  testing  method  developed  to  locate  a  test  sequence  that  allows  the  user  to 
differentiate  between  the  correct  circuit  behavior  and  a  faulty  circuit.  The  goal  of  ATPG  is  to 
find  a  set  of  test  patterns  which  achieve  the  highest  fault  coverage.  A  pattern  set  with  100%  fault 
coverage  consists  of  tests  to  detect  every  possible  ‘stuck-at’  fault  in  a  circuit.  100%  fault 
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Table  2  -  History  of  Algorithm  Speedups  [9] 


Algorithm 

Estimated  speedup  over  D-Algorithm 
(normalized  to  D-ALG  CPU  time) 

Year 

D-ALG  [551] 

1 

1966 

PODEM  [258] 

7 

1981 

FAN  [229,  232,  233] 

23 

1983 

TOPS  [360] 

292 

1987 

SOCRATES  [576] 

1574  ATPG  System 

1988 

Waicukauski  et  a/. [708] 

2189  ATPG  System 

1990 

EST  [110,  253] 

8765  ATPG  System 

1991 

TRAN  [122] 

3005  ATPG  System 

1993 

Recursive  learning  [376] 

485 

1995 

Tafertshofer  et  al.  [648] 

25057 

1997 

coverage  does  not  necessarily  guarantee  high  quality,  since  other  faults  such  as  bridging  or  open 
faults  could  still  occur.  There  are  cases  when  circuits  containing  faults  can’t  show  up  for  any  of 
the  input  sequences  generated.  One  case  might  have  the  fault  be  intrinsically  undetectable 
meaning  that  no  test  patterns  exist  that  can  detect  that  particular  fault.  These  faults  are  redundant 
in  the  sense  that  their  presence  does  not  influence  the  observable  circuit  functionality.  Since  the 
ATPG  problem  is  NP-complete,  a  problem  that  cannot  be  determined  in  a  ‘practical’  amount  of 
time,  there  will  also  be  cases  where  patterns  exist  but  the  ATPG  algorithm  gives  up  since  it  will 
take  a  significant  amount  of  time  to  find  them  [9]. 

Historically,  there  have  been  many  algorithms  developed  in  an  effort  to  utilize  test 
patterns  for  circuit  designs  to  test  for  faults.  Testing  these  integrated  circuits  with  significant 
fault  coverage  has  proven  to  be  a  daunting  task  due  to  its  complexity.  One  of  the  earliest  and 
cornerstone  algorithms  used  for  ATPG  today  is  the  D-Algorithm.  This  algorithm  provided  the 
building  blocks  necessary  to  cultivate  faster  and  more  efficient  ATPG  algorithms.  Table  2 
depicts  a  brief  history  of  the  algorithms  used  for  ATPG  with  estimated  speedup  based  on  the  D- 
Algorithm. 
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These  algorithms  employ  heuristics  that  find  all  necessary  signal  assignments  for  a  test  as 
early  as  possible.  It  has  been  shown  that  sequential  circuits  are  far  more  complex  than 
combinational  circuits  to  achieve  considerable  fault  coverage.  For  combinational  fault 
simulation,  the  complexity  is  0(n2).  Big  Oh  notation  describes  a  growth  rate  as  a  simpler 
function.  For  sequential  fault  simulation,  the  complexity  is  estimated  to  be  between  0(n2)  and 
0(n3)  based  on  empirical  measurements  [9].  These  algorithms  have  been  employed  to  test  for 
defects  on  manufactured  circuits  and  continue  to  be  developed  to  find  faster  and  more  efficient 
ways  of  detecting  faults. 

2.5  Fault  Diagnosis 

Faults  are  understood  to  be  an  abnormal  change  of  system  function  or  defect  at  the 
component,  equipment,  or  subsystem  that  may  or  may  not  lead  to  physical  failure  or  breakdown 
[10].  If  faults  occur,  the  outcome  has  the  potential  of  being  catastrophic  by  possibly  endangering 
lives.  It  is  imperative  that  uncovering  the  location  of  faults  is  critical.  Some  traditional 
approaches  to  fault  diagnosis  have  been  installing  multiple  sensors  and  hardware,  analytical  or 
functional  redundancy,  and  a  combination  of  hardware  and  analytical  redundancy  [11]. 

The  purpose  of  this  research  was  to  be  able  to  implement  a  method  of  detecting  and 
diagnosing  faults  within  the  circuit  while  under  radiation.  Attempting  to  diagnose  the  location  of 
faults  and  errors  on  a  circuit  continues  to  be  investigated  thoroughly.  Much  of  the  recent 
progress  in  fault  diagnosis  can  be  credited  to  the  extensive  use  of  fault  equivalence  to  reduce  the 
number  of  fault  conditions  for  analysis,  since  only  one  fault  from  each  fault  equivalence  class 
needs  to  be  retained  as  a  representative  [12],  However,  the  difficulty  still  consists  of  finding 
diagnostic  methods  which  are  suitable  for  real  time  execution. 
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The  goal  of  this  research  was  to  be  able  to  use  similar  diagnosing  methods  and  algorithms 
to  detect  errors  found  within  the  circuit  design  and  diagnose  the  location  of  the  faults  with  the 
best  possible  resolution.  This  means  the  fault  was  reduced  to  the  minimum  number  of  locations 
that  could  be  distinguished  using  stuck-at  fault  modeling.  The  information  provided  by  this 
diagnostic  would  provide  the  user  the  ability  to  better  address  the  next  action  needed  to  be  taken 
to  correct  the  problem.  The  following  research  attempts  to  uncover  whether  real  time  diagnosing 
under  radiation  has  the  potential  of  being  successful. 

2.6  Radiation  Effects  on  Electronics 

Operational  reliability  is  one  of  the  key  principal  concerns  in  microelectronic  systems. 
This  is  particularly  true  of  space  bound  systems  since  they  are  exposed  to  ionizing  radiation  and 
their  operating  conditions  do  not  allow  for  quick  and  easy  restoration  of  failed  or  malfunctioning 
components.  The  harsh  space  environment  can  cause  severe  damage  and  malfunction  on 
unprotected  electronics.  Trapped  protons  and  electrons  in  the  Earth’s  radiation  belts  and  cosmic 
rays  prove  to  be  crucial  challenges  for  space  electronics  to  operate  normally  in  this  environment. 
Long  periods  of  time  and  exposure  to  space’s  callous  energy  particles  can  degrade  even  the  best 
device’s  performance,  leading  to  component  failure.  Everything  from  major  components  to  the 
wiring  and  cabling  of  electronic  devices  can  be  seriously  affected  by  radiation. 

This  section  will  explore  key  components  of  radiation  effects  on  electronics  related  to  the 
research  goals  of  this  thesis.  Three  key  issues  are  discussed  further,  total  dose  effect,  single 
event  effects  (SEE),  and  single  event  upsets. 

2.6.1  Total  Dose  Effect 

Electronic  devices  in  space  suffer  long-term  radiation  effects,  mostly  due  to  electrons  and 
protons.  Total  dose  effects  refers  to  the  integrated  radiation  dose  that  is  accumulated  by  space 
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electronics  over  a  certain  period  of  time  and  can  reduce  mission  lifetimes  due  to  long-term 
damage  to  devices,  ICs,  or  solar  cells.  Long-term  exposure  can  cause  device  threshold  shifts, 
increased  device  leakage  and  power  consumption,  timing  changes,  and  decreased  functionality 
[13].  After  the  exposure  to  sufficient  total-dose  radiation,  most  insulating  materials  such  as 
capacitor  dielectrics,  circuit-board  materials,  and  cabling  insulators,  become  less  insulating  or 
become  more  electrically  leaky  along  with  certain  conductive  materials,  such  as  metal-film 
resistors,  can  change  their  characteristics  under  exposure  to  total-dose  radiation  [14].  These 
changes  may  not  be  constant  with  time  after  irradiation  and  may  depend  on  the  dose  rate  at 
which  the  radiation  is  received. 

The  radiation  damage  in  the  silicon  dioxide  layers  consists  of  three  components:  the 
buildup  of  trapped  charge  in  the  oxide,  an  increase  in  the  number  of  interface  traps,  and  an 
increase  in  the  number  of  bulk  oxide  traps  [15].  The  ionizing  radiation  primarily  affects  the  gate 
and  field  oxide  layers.  In  CMOS  devices,  the  gate  oxide  becomes  ionized  by  the  dose  it  absorbs 
and  ionization  produces  electron-hole  pairs  in  insulation  layers.  The  electrons  have  high 
mobility,  but  the  holes  have  lower  mobility.  The  free  electrons  and  holes  drift  under  the 
influence  of  the  electric  field  that  is  induced  in  the  oxide  by  the  gate  voltage.  The  holes  that 
escape  “initial”  recombination  transport  through  the  oxide  toward  the  silicon  and  silicon  dioxide 
interface  by  hopping  through  localized  states  in  the  oxide  [16].  A  small  number  of  holes  become 
trapped  in  the  gate  oxide.  Trapped  charge  in  the  oxide  and  at  interface  regions  changes  the 
threshold  voltage  and  mobility  of  the  gate  and  field-oxide  transistors,  therefore  modifying  their 
characteristics  [23].  The  accumulated  charge  can  be  high  enough  to  keep  the  transistors 
permanently  open  or  closed,  having  the  source  drain  current  no  longer  be  controlled  by  the  gate 
leading  to  device  failure.  Trapped  holes  are  not  stable,  they  gradually  anneal  with  time.  The 
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overall  effect  depends  on  bias  conditions  and  device  technology.  With  devices  becoming 
smaller,  the  gate  oxides  in  these  shrinking  transistors  are  growing  thinner.  Being  thinner,  the 
gate  oxide  traps  less  positive  charge  overall  [14].  Therefore,  transistors  with  smaller 
technologies  are  becoming  inherently  more  radiation  resistant.  Figure  5  shows  how  the  total 
dose  effects  the  threshold  voltage  and  causes  a  shift  in  both  ‘n’  and  ‘p’  transistors.  The  threshold 
voltage  does  change  during  the  annealing  process  after  it  has  been  irradiated  shown  in  Figure  6. 
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Figure  5  -  Threshold  Voltage  of  ‘n’  and  ‘p’  Transistors  during  Irradiation  [15] 
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Figure  6  -  Irradiation  and  Annealing  Effects  [15] 
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Since  the  number  of  electron-hole  pairs  generated  is  directly  proportional  to  the  amount 
of  energy  absorbed  by  the  device  material,  the  total  damage  is  also  roughly  proportional  to  the 
total  dose  of  radiation  received  by  the  device  [15]. 

2.6.2  Single  Event  Effects 

Another  important  category  of  radiation  effects  that  an  integrated  circuit  is  vulnerable  to 
are  Single  Event  Effects  (SEE).  A  SEE  occurs  when  a  single  high-energy  particle  strikes  a 
device,  leaving  behind  an  ionized  track  and  can  lead  to  sudden  device  or  system  failure  (Figure 
7).  These  failures  result  from  the  charge  deposited  by  a  single  particle  crossing  a  sensitive 
region  in  the  device  and  are  a  function  of  the  amount  of  charge  collected  at  the  sensitive  node 
and  the  node  state  [13].  The  ionization  along  the  path  of  the  impinging  particle  collects  at  a 
circuit  node.  The  ionized  track  contains  equal  numbers  of  electrons  and  holes  and  is  therefore 
electrically  neutral. 


Figure  7  -  Cosmic  Ray  Through  the  Strain  of  a  NMOS  Transistor 
The  total  number  of  charges  is  proportional  to  the  linear  energy  transfer  of  the  incoming  particle. 

Every  memory  device  has  a  certain  critical  charge  which  could  result  in  a  SEE  or  other 
undesirable  phenomenon  [14],  The  three  largest  categories  of  SEEs  are  Single  Event  Upsets 
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(SEU),  Single  Event  Latch-up  (SEL),  and  Single  Event  Burnout  (SEB).  The  soft  error  or  upset 
(SEU)  is  a  change  in  the  information  stored  on  the  circuit.  A  hard  error  is  characterized  by 
permanent  or  semi-  permanent  damage  such  as  the  latch-up  (SEL)  or  the  burnout  (SEB). 

Another  source  of  SEEs  is  impurities  in  the  device  material.  There  might  be  traces  of  uranium  or 
thorium,  which  both  are  naturally  radioactive  elements,  decaying  by  alpha  emissions.  The  alpha 
particle  can  then  release  its  charge  and  cause  a  SEE. 

In  the  space  environment,  circuit  designers  have  been  concerned  with  both  the  protons 
and  cosmic  rays  that  lead  to  a  greatly  increased  SEE  rate.  For  cosmic  rays,  SEEs  are  typically 
caused  by  its  heavy  ion  component.  These  cosmic  rays  can  easily  penetrate  the  structure  of  an 
integrated  circuit.  Cosmic  rays  may  be  galactic  or  solar  in  origin.  Protons,  usually  trapped  in  the 
earth's  radiation  belts  or  from  solar  flares,  may  cause  direct  ionization  SEEs  in  very  sensitive 
devices.  However,  a  proton  may  typically  cause  a  nuclear  reaction  near  a  sensitive  device  area 
creating  an  indirect  ionization  effect  potentially  causing  a  SEE. 

2.6.3  Single  Event  Upsets 

A  cosmic  ray,  or  a  secondary  ion  released  via  a  high-energy  proton-induced  nuclear 
reaction,  can  deposit  enough  energy  within  a  sensitive  node  that  an  integrated  circuit  can  be 
upset.  These  single  event  upsets  (SEUs)  are  analogous  to  soft  errors  in  electronics  or  avionics 
due  to  energetic  alpha  particles  or  atmospheric  neutrons.  SEU  is  defined  by  NASA  as 
"radiation-induced  errors  in  microelectronic  circuits  caused  when  charged  particles  (usually  from 
the  radiation  belts  or  from  cosmic  rays)  lose  energy  by  ionizing  the  medium  through  which  they 
pass,  leaving  behind  a  wake  of  electron-hole  pairs."  A  SEU  usually  manifests  itself  as  a  state 
change  or  "bit-flip"  of  a  single  data  bit  or  memory  cell  that  causes  a  momentary  glitch  in  the 
device  output  (Figure  8). 
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Figure  8  -  Example  of  SEU 

If  enough  of  these  upsets  occur,  or  if  a  single  critical  node  is  affected,  a  reset  or  rewriting 
of  the  device  results  in  normal  device  behavior  would  be  required.  Single-event  upsets  occur  in 
computer  memories,  microprocessors,  controllers,  and  almost  any  digital  circuit  containing 
memory  elements.  They  do  not  cause  lasting  damage  to  the  device,  but  may  cause  lasting 
problems  to  a  system  which  cannot  recover  from  such  an  error.  Also,  in  very  sensitive  devices,  a 
single  ion  hits  two  or  more  bits  causing  simultaneous  errors,  known  as  multiple-bit  upsets 
(MBUs),  in  adjacent  memory  cells.  As  the  minimum  device  feature  size  is  down  scaled  to 
smaller  and  smaller  dimensions,  the  susceptibility  to  such  SEU’s  has  been  found  to  increase 
remarkably  [15]. 

Research  and  mitigation  techniques  within  this  field  have  been  growing  significantly. 
New  methods  and  studies  have  been  investigated  in  order  to  try  to  lower  the  number  of  possible 
SEU  on  a  circuit  at  a  given  time.  An  alternative  approach  to  reducing  SEU  and  transient  upset 
levels,  as  well  as  eliminating  the  possibilities  of  latch-up,  is  to  use  silicon  on  sapphire  or  silicon 
on  insulator  technologies  to  build  CMOS  circuits  [15].  Other  methods  have  been  using  the 
hardened  by  design  methodology  i.e.  Tripler  Modular  Redundancy.  It  is  with  these  mitigation 
techniques  and  research  where  these  phenomena  could  be  carefully  investigated  to  better 
understand  and  predict  when  these  upsets  could  occur  with  space  electronics. 
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2.7  Gamma  Radiation  Source 

The  cobalt-60  isotope  (Co-60)  was  used  as  the  source  of  ionizing  radiation  for  this 
experiment.  Co-60  undergoes  beta  decay  with  a  half-life  of  5.24  years  releasing  two  gamma 
particles  and  one  electron,  demonstrated  in  Figure  9. 
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Figure  9  -  Decay  scheme  of  6#Co  [20] 

Ohio  State  University  (OSU)  Nuclear  Reactor  Lab  (NRL)  in  Columbus,  Ohio  provided  a 
Co-60  source.  A  simplified  diagram  showing  the  gamma  irradiator  can  be  found  in  Figure  10.  It 
contains  a  six  inch  wide  aluminum  tube  containing  a  movable  platform  that  can  be  raised  and 
lowered  out  of  the  irradiator.  The  gamma  irradiator  cell  itself  sits  on  the  bottom  of  a  pool  of 
water  and  consists  of  14  Co-60  sources  evenly  spread  around  the  aluminum  tube. 


Figure  10  -  Co-60  Gamma  Irradiator  Layout  [17] 
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When  the  device  under  test  (DUT)  is  lowered  into  the  tube,  the  radiation  dose  rate  is 
based  on  the  location  of  the  device  relative  to  the  center  of  the  Co-60  source  rods.  However,  the 
dose  rate  curve  is  based  on  the  distance  of  the  DUT  above  the  bottom  of  the  moveable  platform 
when  the  platform  is  resting  on  the  bottom  of  the  aluminum  tube  [17].  The  radiation  dose  curve 
provided  by  OSUNRL  is  depicted  in  Figure  1 1 . 


Co-60  Gamma  Irradiator  Dose-Rate  Curve  in  6"  Tube 


Height  above  Bottom  (in) 

Figure  11  -  Dose  Rate  of  Co-60  Irradiator  [21] 

2.8  Thermal  Radiation  Source 

A  1600W  Xe  lamp  thermal  simulator  was  used  for  the  irradiation  of  the  DUT  in  this 
thesis.  The  schematic  shown  below  (Figure  12)  is  accurate  with  the  exception  that  the  thermal 
simulator  used  in  these  experiments  had  an  output  direction  rotated  90  degrees  so  that  the  output 
was  perpendicular  to  the  major  axis  of  the  thermal  simulator  producing  a  horizontal  beam  [18]. 
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Figure  12  -  Schematic  of  1600  W  Solar  Simulator 

The  thermal  simulator  was  assembled  by  Koehl  [19]  and  the  manufacturer  recorded  a  spectral 
output  seen  in  Figure  13  below. 

Spectral  output  of  a  full  spectrum  1600W  Solar  Simulator,  The  Newport  Resource  Catalog  (200S/2009);p200 
Imdiar.ee  W/(j»r  ma) 


Wavelength  (am) 

Figure  13  -  Spectral  Output  from  1600W  Thermal  Simulator 

The  spectrum  is  largely  a  smooth  Plankian  distribution,  reflecting  the  source  plasma  temperature 

with  superimposed  lines  from  the  xenon  emission  spectrum.  The  smooth  Plankian  portion  of  the 
distribution  is  similar  to  the  measured  intensities  for  a  1  kT  nuclear  explosion  [19]. 
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2.9  Previous  Work 


Research  with  this  topic  has  been  performed  in  the  last  few  years.  The  work  presented 
with  the  last  two  AFIT  theses  has  been  inconclusive.  There  were  many  problems  that  were  fixed 
from  the  previous  work  along  with  a  significant  contribution  by  real  time  detection  and 
diagnosing. 

The  original  code  developed  for  evaluating  circuits  under  radiation  was  not  operational. 
The  next  version’s  code  was  primitive  and  could  not  offer  any  real,  solid  conclusion.  The 
pervious  hardware  constructed  for  communication  between  the  two  boards  was  suboptimal  and 
unusable.  The  efforts  in  building  TMR  on  the  base  level  was  not  implemented  properly  because 
the  code  was  written  and  optimized  by  Xilinx  and  not  built  structurally.  The  monitoring  system 
for  collecting  errors  was  not  actually  accounting  items  in  real  time.  It  polled  the  system  in 
intervals  that  the  max  speed  for  RS232  could  perform  at. 

Not  everything  in  the  work  done  previously  was  completely  inconclusive.  Results  from 
previous  work  suggested  that  the  baseboard  was  the  cause  of  most  of  the  errors  in  the  result.  The 
results  also  demonstrated  that  dose  rate  versus  total  dose  could  be  significant.  Both  works 
performed  previously  made  an  attempt  to  characterize  the  effects  of  radiation  on  electronics  but 
failed  to  fully  implement  a  capable  measuring  system. 

The  current  research  corrected  these  problems.  The  testing  platform  utilized  ribbon 
cables,  buffers,  and  capacitors  to  correctly  implement  a  bridge  between  the  two  FPGA  boards. 
TDTMR  was  implemented  structurally  and  laid  out  onto  the  mini  module  FX12  series  board 
(Figure  14).  The  monitoring  system  was  only  used  to  see  the  results.  The  actual  data  that  was 
measured  was  written  to  a  flash  device  in  real  time.  Further  detail  of  the  implementation  of  this 
research  is  described  in  the  following  chapter. 
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Figure  14  -  Layout  of  the  3  Adders  on  the  FX12  Board 


2.10  Summary 

This  chapter  provides  a  quick  background  in  key  areas  that  are  being  investigated  in  this 
research.  There  are  many  fault  redundancy  techniques  that  are  used  in  a  variety  of  forms  in 
research.  This  particular  research  focuses  on  TMR.  Radiation  effects  on  electronics  also  have  a 
very  large  and  comprehensive  background.  A  full  understanding  and  explanation  of  these  effects 
still  remains  a  bit  of  a  mystery.  The  previous  two  AFIT  researchers  in  this  area  did  not 
completely  cover  the  most  basic  concepts.  Their  attempts  did  produce  a  few  benefits,  in  the  long 
run  there  were  too  many  errors  to  take  any  work  that  was  accomplished  and  build  upon  it.  All 
the  work  done  for  this  thesis  was  rebuilt  from  the  ground  up.  The  goal  was  to  correct  these 
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mistakes  and  expand  on  the  concepts  of  radiation  effects  on  electronics.  Further  research  into 
these  areas  can  be  found  through  the  references  in  the  bibliography. 
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III.  Methodology 


3.1  Chapter  Overview 

This  section  describes  the  entire  methodology  used  to  accomplish  the  evaluation  of  the 
testing  platform,  the  detection  and  diagnosis  algorithm  and  TDTMR.  The  first  section  breaks 
down  each  portion  of  the  testing  platform.  A  description  of  the  methodology  used  to  create  the 
detection  and  diagnosis  algorithm  is  described  in  this  chapter.  Each  test  set  up  for  the 
experiment  was  also  included.  Further  descriptions  and  diagrams  can  also  be  found  in  the 
Appendix. 

3.2  Test  Platform 

The  overall  testing  platform  was  broken  into  three  parts:  the  controller  unit  which  houses 
all  the  command  logic  along  with  the  processing  of  data  for  the  diagnosis  algorithm,  the  TMR 
unit  or  Device  Under  Test  (DUT)  which  describes  the  circuitry  being  evaluated  under  radiation, 
and  the  external  devices  which  monitors  and  records  all  the  information  being  processed. 


Figure  15  -  Block  Diagram  of  Testing  Platform 

3.2.1  TMR  Design 

The  FPGA  device  under  test  (DUT)  for  these  radiation  experiments  was  the  Xilinx  Virtex 
4  FX12  Series  Mini-module  mounted  on  an  Avnet  Mini-Module  Baseboard,  pictured  in  Figure 
16.  The  FPGA  contained  90nm  transistor  technology  with  10  layers  of  metal  interconnects  and 
triple  oxide  technology  running  internally  at  1.2  Volts  (V)  [22],  A  more  complete  description  of 
the  Virtex  4  Mini  Module  can  be  found  in  Appendix  A. 
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Figure  16  -  DUT  Virtex  4  Mini-Module 

The  DUT  component  consisted  of  the  new  Triple  Design  TMR.  The  design  also  utilized 
the  triplicated  voter  enhancement  [2],  The  DUT  contains  three  copies  of  different  forms  of  adder 
logic.  The  Carry  Look  Ahead  Adder  and  Ripple  Carry  Adder  designs  were  implemented 
structurally  using  base  level  gates.  The  two  adder  diagrams  can  be  found  in  Appendix  B  and  C. 
The  third  adder  was  implemented  in  a  behavior  method,  meaning  it  was  left  to  the  Xilinx 
software  to  layout  how  the  adder  would  be  designed.  The  three  adder’s  results  entered  the 
triplicated  voter  logic  in  an  effort  to  perform  fault  redundancy.  The  outputs  of  each  adder  were 
also  sent  to  the  controller  board  along  with  the  results  of  the  triplicated  voter  logic.  Figure  17 
shows  the  design  of  the  entire  TDTMR  Unit. 


CLA  Results 


RCA  Results 


Voter  Results 


BA  Results 


Figure  17  -  Diagram  of  TDTMR 
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Other  components  were  also  added  to  the  DUT  as  well,  for  testing  purposes.  A  clock 
generator  and  an  error  generator  were  also  added  in  the  DUT  (Figure  18).  These  units  were  used 
to  simulate  fault  injections  into  the  design.  A  fault  injection  is  an  error  purposefully  implanted 
into  the  integrated  circuit  that  can  be  controlled  by  a  user  or  automated  through  a  computer 
program  with  specific  guidelines. 


Figure  18  -  Diagram  of  DUT 

The  error  unit  generated  errors  at  different  timing  intervals  to  simulate  single  errors  into 
the  TDTMR.  These  generated  errors  would  lie  dormant  unless  activated  by  a  switch  from  the 
base  board.  Upon  activation  it  would  generate  four  errors  at  four  different  timing  intervals,  one 
microsecond,  one  millisecond,  two  seconds,  and  eight  seconds. 

The  overall  goal  of  this  research  was  to  radiate  just  the  Virtex  4  chip  without  the 
baseboard.  To  help  eliminate  other  possible  errors  and  isolate  the  chip  itself,  specialized 
connector  cables  (Figure  19)  were  constructed  to  allow  just  the  chip  to  be  placed  under  radiation 
and  keep  the  base  board  protected  from  any  possible  radiation  damage.  No  buffers  were  needed 
to  have  the  chip  be  fully  functional.  With  the  current  cables,  the  chip  itself  can  be  two  feet  away 
from  the  base  board  without  any  problems  with  timing  and  synchronization. 
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Figure  19  -  Connector  Cables  between  Chip  and  Baseboard 
3.2.2  Controller  Board 

The  Virtex-II  Pro  development  system  was  the  FPGA  chosen  to  implement  the  controller 
board  logic.  It  contains  a  PowerPC  Processor  which  was  used  in  the  compilation  of  the 
diagnostic  algorithm.  130  nm  technology  was  also  used  with  this  board  along  with  nine  layers  of 
metal.  Output  pins  where  soldered  onto  the  board  to  provide  the  ability  to  utilize  ribbon  cables. 
Figure  20  shows  the  Virtex-II  Pro  board.  Further  description  of  the  Virtex-II  Pro  can  be  found  in 
Appendix  D. 
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Figure  20  -  Virtex-II  Pro  Controller  Board 

VHSIC  Hardware  Description  Language  (VHDL)  controlled  the  basic  flow  of  how  the 
hardware  operated.  A  state  machine  was  developed  in  VHDL  on  the  controller  board  to  run  the 
basic  operations.  Figure  21  shows  the  state  diagram  used  for  the  controller  board. 


The  remaining  parts  of  the  controller  logic  was  written  in  C  and  utilized  by  the  PowerPC.  The 

controller  board  cycles  through  vectors  of  nine  bits  for  the  input  that  come  from  a  counter  or 

random  module.  These  two  modules  can  be  interchanged  during  runtime  by  the  user.  A  copy  of 

the  DUT  circuit  is  also  placed  in  the  controller  board  and  acts  as  the  ‘golden  circuit’  for 

comparison  purposes.  All  the  values  from  the  DUT  were  compared  to  the  ‘golden  circuit’ 
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results.  Figure  22  illustrates  the  block  diagram  of  the  controller  logic.  Upon  detecting  an  error, 
the  controller  board  automatically  switches  to  ‘Diagnostic  Mode’  which  has  all  the  test  and 
diagnostic  vectors  chosen  as  inputs.  The  PowerPC  also  runs  the  algorithm  that  calculates  the 
location  of  the  error.  This  will  be  discussed  further  in  this  chapter. 


Figure  22  -  Controller  Board  Block  Diagram 

3.2.3  External  Devices 

The  two  external  devices  used  for  the  test  platform  is  a  laptop  attached  to  the  Virtex-II 
Pro  board  via  RS232  (serial)  cable  and  a  compact  flash  card  which  is  on  the  board. 

Hyper-terminal  was  used  to  communicate  with  the  controller  board.  The  hyper-terminal 
acted  as  a  monitor  of  all  the  operations.  A  user  can  check  the  number  of  errors  detected,  how 
long  it’s  been  running,  how  many  vectors  have  been  checked,  and  some  of  the  input  vectors  with 
its  results  from  all  the  outputs.  Since  the  controller  board  operates  significantly  faster  than  what 
the  hyper-terminal  can  output,  the  input  vectors  and  output  results  are  randomly  chosen  for 
display.  The  user  can  also  change  to  four  different  modes  through  the  hyper-terminal,  namely 
counter,  random,  test,  and  debug.  Figure  23  shows  what  the  monitoring  screen  would  look  like. 
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17  C0M1  -  PuTTY 


Control  Board  Module  Control  Program 


Press  1 q1  to  quit... 
Beginning  run:  003 


Elapsed  CPU  time 
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Result 
Vector  Source 
Loop  (ns) 
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ivg:  2147483647  min:  2147483647  max:  214743364^ 


Figure  23  -  Hyper-Terminal  Monitoring  Screen 

The  compact  flash  card  was  utilized  to  record  all  the  data  in  real  time  for  possible  post 
processing  analysis.  Due  to  the  volume  of  vectors  being  analyzed,  only  when  presented  with  an 
error  would  information  be  logged  onto  the  flash  card.  The  flash  card  recorded  which  input 
vector  caused  a  failure  along  with  that  vector’s  results.  It  also  documented  all  the  results  from 
each  of  the  test  and  diagnostic  vectors.  Figure  24  shows  an  example  of  what  the  output  on  the 
flash  card  would  look  like. 


Vector©  Source 
[tD^Src^gfikqy, 


Gold  Circuit  Input  Values 


Carry  L/okahead'  Ripple  Carry,  Voted,  GC,  IN^A,  IN_B,  IN_C, 
i038983Crest)e,  e,  e,  e/eJ8, 5  ,  X/f  5dafedf ,  3699.186364 
103898367T5r£,  4,  4,4,  4X^aXf49af6c6 , 3699. 18662  5 
10389837, Test,  f,  f ,  f ,  f ,  <1 1  e  j  QJF4009646 , 3699 . 186886 
10389838 ,  Tes  t ,  1 , 1 , 1 , 1 ,  l/S-,  c,  B7f 4009640,  3699 . 187147 
103898  39,  Test,b,b,b,b,b,e,d,0,f 4008640,  3699. 187409,  Fault  Location  List 
1  0389840 .  Tps  1~ .  3.  3.  3.  3.  3.h.  7.1  .  f 0008000  ’  369fl^ftTff7T ...  tt... 

10389841,  Test,  f.f.l.f.f,  6, 8,  gT 0000800CT699 . 187939 
10389988 ,  Count ,  1 , 1 , 3 , 1 , 1 , 0,  oXTffffTTf;  3699 . 2  3819; 

10389989, Test, e, e, e,e,e, 8, 5, l,f5dafedf, 3699. 2 38542 
10389990, Test, 4, 4, 4, 4, 4, a, 9, I,f49af6c6, 3699.239392 
10389991, Test, f , f , f , f, f , 1 , e , 0, f4009646 , 3699.239654 
10389992, Test, 1,1, 1,1,1, 5, c, 0, f4009640, 3699.239915 


10389993, Test 
10389994, TesQ,  3, 3,_ 
10389995,  TestTFyF' 1/F 


h,h,_h,h,P,d:0,f400S640,  3699.240177 
3 , b , 7 , 1 , f 0008000, 3699.240440 
f , 6 , 8 , 1 , 00008000 ,3699.2  40708 


Adder  Results,  Vector  Result 


Fault,  Time 


Figure  24  -  Output  of  the  Flash  Card 


The  card  would  record  the  vector  ID,  the  vector  source,  all  the  outputs  of  the  adder  and 


voter,  the  gold  circuit,  the  input  vectors,  and  the  time  of  when  the  vector  was  executed.  The 


FPGA  would  write  to  a  buffer,  a  temporary  file,  for  two  hundred  kilobytes  worth  before  writing 
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it  out  to  flash.  This  process  was  done  in  case  of  an  error  or  problem  that  potentially  causes  the 
FPGA  to  crash.  If  there  was  a  failure  while  writing  to  flash,  everything  written  into  the  buffer 
would  all  be  lost  or  corrupted.  The  smaller  files  allow  less  data  to  be  lost  in  case  of  a  crash. 
3.2.4  Buffer  Bridge 

The  Virtex  4  Mini-Module  was  originally  chosen  for  its  size  to  fit  down  a  pipe  under 
radiation.  To  communicate  between  the  two  boards,  ribbon  cable  was  used.  Due  to  the 
incompatibility  of  the  two  boards  to  match  the  same  pin  layout,  a  bridge  board  needed  to  be 
constructed  (Figure  25). 


Figure  25  -  Bridge  Board 

The  bridge  board  also  serves  as  a  buffer  to  clean  up  the  signal  degradation  between  the 
two  boards.  Since  there  would  be  fifteen  feet  of  cable  between  the  controller  board  and  the 
DUT,  high  speed  CMOS  buffers  were  used  to  aid  in  cleaning  up  the  signal.  Full  details  and 
specifications  of  the  high  speed  CMOS  buffers  can  be  found  in  Appendix  E.  Figure  26  displays 
the  comparisons  of  the  signals  with  and  without  the  CMOS  buffers  after  fifteen  feet.  Another 
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addition  to  the  bridge  board  was  to  use  IC  sockets  with  capacitors.  These  sockets  also  aided  in 
cleaning  up  the  signal.  Appendix  F  contains  the  specs  of  these  sockets. 


Figure  26  -  Signal  Comparisons 

3.3  Operation  Speed 

The  Virtex-II  Pro  Power  PC  has  the  ability  to  operate  at  clock  speeds  of  300  MHz  and  the 
Virtex  4  Mini-Module  can  operate  at  100  MHz.  The  Virtex-II  Pro  board  itself  can  operate  at  100 
MHz.  In  order  to  get  the  two  boards  to  properly  communicate  with  each  other,  the  speed  of  each 
board  had  to  be  reduced.  The  biggest  hurdle  was  due  to  the  distance  of  the  two  boards.  Using  a 
clock  pulse  and  the  oscilloscope,  measurements  were  made  with  fifteen  feet  of  cable.  With  the 
two  boards  needing  to  be  fifteen  feet  apart,  the  boards  with  the  aid  of  the  buffer  board  were  able 
to  sustain  a  clean  square  pulse  at  1  MHz. 

Due  to  the  nature  of  the  diagnostic  algorithm  and  the  predetermined  vector  sets 
(explained  later  in  this  thesis),  a  bottle  neck  developed  between  the  Power  PC  and  the  Virtex-II 
Pro.  The  instantiated  hardware  required  more  processing  time  to  handle  the  code  from  the  Power 


PC.  Thus,  delay  was  added  to  the  Power  PC  processing  to  allow  the  Virtex  II-Pro  to  execute  all 
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its  commands  and  still  be  in  synch.  An  attempt  was  made  to  investigate  this  problem  further; 
however,  no  concrete  solution  was  discovered  in  time.  This  bottle  neck  caused  the  code  to  not 
reach  the  potential  of  1  MHz  or  one  million  vectors  per  second.  With  the  current  code  in  place, 
the  system  can  operate  on  average  only  2,817  vectors  per  second.  Possible  enhancements  are 
noted  in  the  future  works  section  of  this  thesis. 

3.4  Fault  Detection  and  Diagnosis 

This  section  describes  the  process  and  methodology  used  to  come  up  with  the  fault 
detection  and  diagnosis  algorithm.  The  method  used  to  detect  faults  was  similar  to  previous 
work  done  in  this  field  [17]  with  a  few  differences.  The  diagnosis  algorithm  was  developed  by 
utilizing  the  TESTCAD  test  generation  and  fault  simulation  tool  sets.  The  full  algorithm  flow 
chart  is  displayed  in  Figure  27. 
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START 


Figure  27  -  Fault  Detection  and  Diagnosis  Algorithm 

3.4.1  Fault  Detection 

The  algorithm  runs  a  simple  loop  of  generating  inputs  and  checking  the  outputs.  The 
controller  board  generates  one  of  the  2N  possible  input  combinations,  ‘N’  representing  the 
number  of  inputs,  to  be  put  through  the  DUT.  With  nine  inputs,  there  are  possible  29  or  512 
unique  input  combinations.  Once  the  controller  board  receives  the  results  of  the  input  vector 
after  it  has  travelled  to  the  DUT,  it  performs  a  bitwise  comparison  on  the  results  of  the  three 
adders  and  the  voter  logic  to  the  ‘gold  circuit’.  The  bitwise  comparison  determines  which  adder 
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caused  an  error  and  which  bit  of  that  particular  adder  was  wrong.  The  error  sends  an  interrupt  to 
the  processor  to  begin  the  diagnosis  of  the  error. 

3.4.2  Fault  Diagnosis 

The  process  of  diagnosing  the  exact  location  of  faults  within  a  circuit  was  a  two  step 
process.  The  first  step  was  to  utilize  the  TESTCAD  test  generation  and  fault  simulation  tool  set 
in  order  to  develop  the  test  vector  and  the  individual  fault  list.  The  second  step  after  creating 
these  lists,  an  algorithm  was  developed  to  detect  the  fault  location  with  the  best  possible 
resolution  for  stuck-at  fault  modeling. 

3.4.2. 1  Test  Vector  Generation 

Due  to  creating  the  adders  structurally,  it  made  the  creation  of  the  fault  list  easier.  The 
TESTCAD  tools  allowed  for  the  creation  of  all  the  possible  faults  that  could  be  found  in  a 
structural  design  along  with  fully  optimizing  and  reducing  the  fault  list  by  equivalence  and  then 
by  dominance.  The  fault  reduction  was  able  to  reduce  the  number  of  possible  faults  by  16-18%. 
The  TESTCAD  tools  evaluated  the  circuit  combined  with  the  fault  list  and  were  able  to  provide 
100%  fault  efficiency.  This  meant  that  all  the  faults  within  a  circuit  were  detected.  The  next 
goal  was  to  find  out  how  many  faults  a  vector  was  able  to  uncover.  Each  of  the  2N  possible  input 
combinations  were  simulated  individually  to  list  all  the  possible  fault  locations  it  detected  that 
associated  with  that  particular  input  vector.  Appendix  G  has  a  brief  description  of  all  the 
TESTCAD  tool  commands. 

The  data  collected  from  the  TESTCAD  tools  were  tabulated  and  placed  in  a  giant  table. 
Figure  28  depicts  a  small  portion  of  the  table  that  indicates  all  the  fault  locations  found  with  each 
test  vector  along  the  ‘x’  axis  and  their  respective  vector  outputs  along  the  ‘y’  axis.  A  full 
detailed  list  containing  all  the  vectors,  faults,  and  detectable  faults  can  be  found  in  Appendix  H. 
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Figure  28  -  Layout  of  Detected  Faults 

Each  individual  test  vector  was  able  to  detect  on  average  thirty  faults.  The  final  goal  was 


to  reduce  the  number  of  vectors  to  the  least  amount  that  would  still  cover  every  possible  error. 


The  table  was  sorted  by  each  of  the  output  bits.  This  showed  that  there  were  a  lot  of  redundant 


test  vectors.  After  eliminating  all  the  redundant  vectors,  around  twenty-four  vectors  remained. 


The  average  number  of  faults  detected  by  each  vector  for  each  output  bit  reduced  to  eight.  From 


here,  the  list  of  test  vectors,  diagnostic  vectors,  and  their  particular  fault  lists  were  made.  These 


lists  of  vectors  and  their  respective  fault  list  were  stored  on  the  controller  board. 


3. 4.2. 2  Diagnosis  Algorithm 


With  the  test  vectors  and  fault  lists  stored  on  the  controller  board,  an  algorithm  was 


developed  in  an  effort  to  detect  the  location  of  a  single  fault  with  the  best  possible  resolution.  By 


following  a  similar  methodology  of  Deductive  Fault  Simulation  [9],  an  inverse  deductive  fault 


detection  algorithm  was  developed. 


The  goal  of  this  algorithm  was  to  diagnose  a  single  fault  in  the  entire  system.  Upon 


determining  which  adder  and  which  bit  caused  the  error,  the  algorithm  would  choose  the 


appropriate  test  vector  list  and  fault  list  to  start  inverse  deductive  fault  detection.  A  possible 


fault  list  array  was  created  and  populated  with  values  of  ‘  1  ’  which  represent  the  possible  location 


of  faults.  Each  of  the  test  and  diagnostic  vectors  would  process  through  the  system  and  the 
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results  would  be  evaluated.  Pending  on  passing  or  failing,  would  determine  how  the  elimination 
process  of  the  fault  list  array  would  be  conducted.  The  algorithm  would  continue  in  a  loop  until 
only  one  error  remained  or  the  end  of  the  list  has  been  reached.  If  more  than  one  fault  remained 
at  the  end  of  the  list,  it  was  determined  that  there  was  no  possible  way  to  distinguish  the  single 
fault  amongst  them  all.  The  remaining  faults  would  be  the  best  possible  resolution  for  fault 
diagnosis. 

The  possibility  of  having  multiple  bit  failures  did  present  an  issue.  The  current 
algorithm’s  scope  to  diagnose  the  location  of  faults  expects  to  have  only  one  bit  failure. 

However,  when  multiple  bit  failures  were  present,  the  algorithm  runs  through  a  set  of  test  vectors 
without  attempting  to  perform  the  diagnosis  algorithm.  The  results  are  stored  on  the  flash  card  to 
perform  post  processing  analysis. 

3.5  Test  Setup 

The  ultimate  goal  for  this  research  was  to  test  the  algorithm  under  a  form  of  radiation.  To 
verify  the  validity  of  the  algorithm,  different  test  set  ups  were  created.  Hardware  fault  injection, 
gamma  radiation,  and  thermal  radiation  were  the  testing  environments.  Each  setup  utilized  the 
test  platform  in  its  entirety. 

3.5.1  Injected  Fault  Setup 

The  setup  for  this  test  was  conducted  in  an  open  lab.  It  utilized  the  same  full  test 
platform  described  above.  Four  faults  were  injected  in  random  locations  in  the  structural  design 
of  the  adders.  The  error  generator  rotates  through  the  four  different  faults  one  at  a  time  for 
specific  durations.  The  test  platform  executed  for  seventy  two  hours  straight  with  no  errors  to 
exercise  its  durability  and  duration.  Fault  injections  were  manually  simulated  randomly 
throughout  the  testing  process. 
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3.5.2  Gamma  Radiation  Test  Setup 

The  test  platform  was  originally  constructed  to  be  tested  under  gamma  radiation  at  the 
OSUNRL.  The  DUT  was  separated  from  the  base  board  and  placed  eight  inches  from  the 
bottom  based  on  Figure  10  to  achieve  the  highest  dose  rate  per  hour  (see  Chapter  2).  The  base 
board  was  strapped  above  the  DUT  separated  by  a  thick  piece  of  lead  to  be  protected  from  the 
radiation.  The  bridge  board,  controller  board,  and  the  remaining  devices  stayed  fifteen  feet 
above  the  reactor.  Figure  29  depicts  the  entire  set  up  used  for  the  gamma  radiation  test. 


Figure  29  -  Gamma  Radiation  Test  Setup 


3.5.3  Thermal  Radiation  Test  Setup 

Thermal  Radiation  was  chosen  as  another  form  to  generate  errors  on  the  DUT.  In  a 
previous  thesis  [19],  a  Newport  Xe  Solar  Simulator  was  assembled  with  the  ability  to  provide  3.3 
cal/cm2s  irradiance.  Figure  30  depicts  the  set  up  used  for  this  irradiation  test. 
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Figure  30  -  Thermal  Radiation  Test  Setup 

A  two  inch  diameter  fused  silica  Plano  convex  lens  was  inserted  95  mm  after  the 
focusing  optic  of  the  setup.  The  focal  length  of  the  second  external  optic  is  150  mm.  However, 
to  increase  the  homogeneity  of  the  beam,  the  target  was  placed  closer  to  the  second  external 
optic  than  the  focal  length  [18].  A  pinhole  was  constructed  that  was  placed  in  between  the  two 
inch  Silica  Lens  and  the  FPGA  to  ensure  the  focus  of  the  thermal  radiation  onto  the  FPGA  alone. 
Figure  31  provides  a  diagram  with  detailed  specification  of  the  entire  set  up. 


DUT 


Figure  31  -  Thermal  Radiation  Specification  Setup 
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3.5.4  Optical  Laser  Test  Setup 

Using  a  COTS  laser  pen  to  cause  errors  was  another  test.  The  test  setup  consisted  of  a 
laser  pointer,  a  pinhole  plate,  and  the  DUT.  All  the  lasers  were  green  in  color  with  a  wavelength 
of  532  nm.  Three  different  wattage  laser  pens  were  used  in  an  attempt  to  create  errors.  The 
three  lasers  had  wattages  of  10  mW,  20  mW,  and  50  mW.  The  cover  of  the  FPGA  on  the  DUT 
was  removed  for  this  test.  The  pinhole  plate  was  600  micrometers  in  diameter. 

The  laser  pointer  was  setup  with  the  pinhole  over  the  laser  to  get  a  more  focused  target 
onto  the  uncovered  FPGA.  The  distance  to  the  FPGA  was  73  mm.  Each  laser  was  focused  on 
one  corner  of  the  FPGA.  Figure  32  depicts  the  entire  set  up  used  for  the  optical  laser  test. 


Figure  32  -  Optical  Laser  Test  Setup 

3.5.5  Optical  Flash  Test  Setup 

The  last  test  setup  used  for  this  research  that  was  available  was  the  optical  flash  test.  The 
goal  of  this  test  was  to  see  if  the  Electrical  Magnetic  Interference  (EMI)  of  a  professional, 
camera  flash  device  would  cause  errors  on  the  DUT.  Figure  33  depicts  how  the  test  setup  was 
arranged. 
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Figure  33  -  Optical  Flash  Test  Setup 

The  flash  device  was  controlled  digitally  through  the  concept  in  photography  known  as 
F-stop.  F-stops  are  powers  of V'2 .  The  first  /  stop  is  V2°,  or  // 1.  Next  is  \/'2 1 ,  or  //1.4,  then 
\/2“for  f/2,  etc  [24].  For  each  increment  of  f-stop  number  the  output  wattage  doubles,  f/1  starts 
with  output  wattage  of  18.75,  f/2’s  output  wattage  is  37.5,  f/3’s  output  wattage  is  75,  and  so 
forth. 

Three  different  DUT  tests  were  conducted.  One  was  with  a  brand  new  mini  module  with 
no  modifications.  The  other  two  tests  used  another  brand  new  module  with  the  outer  most 
protective  plate  removed.  One  of  those  setups  had  the  DUT  completely  exposed  to  the  flash. 

The  other  setup  used  a  600  micrometer  pinhole  to  focus  the  flash  onto  one  part  of  the  exposed 
DUT. 

3.6  Summary 

The  previous  chapter  describes  the  entire  process  to  develop  the  test  platform  and  all  the 

components  associated  with  it.  A  description  of  the  development  of  the  test  and  diagnostic 

vectors  was  discussed  and  how  they  were  derived,  set  up,  and  implemented.  The  entire  detection 
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and  diagnosing  algorithm  was  also  described  in  this  chapter.  Each  of  the  five  test  setups  that 
were  used  are  described  in  detail  on  how  they  were  constructed  for  testing  the  goals  of  this 
research.  Further  code,  diagrams,  specifications  sheets,  and  other  pertinent  information  can  be 
found  in  the  Appendix. 
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IV.  Results  &  Analysis 


4.1  Chapter  Overview 

This  chapter  covers  all  the  results  and  analysis  performed  for  this  research  as  described  in 
the  previous  methodology  chapter.  Each  section  covers  all  the  results  from  all  the  test  setups 
used  to  obtain  errors. 

4.2  Test  Setup  Results 

The  five  subsections  describe  the  results  gathered  from  each  of  the  five  tests.  Some  tests 
were  more  successful  than  others.  A  detailed  account  of  each  experiment  was  recorded  and 
written  out  in  this  chapter. 

4.2.1  Injected  Fault  Results 

The  results  for  this  hardware  simulation  test  setup  were  completed.  The  objective  of 
locating  the  fault  within  the  circuit  and  be  able  to  diagnose  the  location  of  that  particular  fault  in 
real  time  were  achieved.  The  four  hardware  injected  faults  were  located  with  the  best  resolution 
through  the  algorithm  described  in  Chapter  3.  The  algorithm  was  able  to  pinpoint  the  exact 
location  of  the  fault  that  was  randomly  placed  in  the  structural  design  of  either  the  Ripple  Carry 
Adder  (Figure  34)  or  the  Carry  Fook  Ahead  Adder  (Figure  35).  Three  of  the  four  faults  were 
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Figure  34  -  Injected  Fault  Locations  for  Ripple  Carry  Adder 


Figure  35  -  Injected  Fault  Locations  for  Carry  Look  Ahead  Adder 
detected  and  the  diagnostic  algorithm  pinpointed  them  exactly  on  the  circuit  diagram.  One  of  the 

faults  was  not  able  to  be  located  exactly;  however,  the  algorithm  reduced  the  number  of  possible 
locations  to  three.  This  result  cannot  uniquely  distinguish  amongst  the  three  faults  without  fully 
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destroying  the  circuit.  Using  this  method,  it  is  the  best  possible  solution  that  can  be  found. 
Figure  36  shows  the  results  printed  in  the  flash  device  depicting  how  the  algorithm  reduces  the 


number  of  faults  in  an  effort  to  pinpoint  the  exact  fault. 


10402451 , Count, 0, 0, S, 0, 0, 5 , b , 0, ffffffff , 3704. 42043 
10402452, Test, f,f,f,f,f, 6, 8, I,f5fabd3f, 3704.420779 
1040245  3,Test,f,f,f,f,f,l,e,0, flfabd3f , 3704.42112  5 
10402454, Test, e,e,e,e,e,8,5,l, fldab03f , 3704.421470 
1040245  5, Test, 8, 8, 8, 8, 8, 2, 5, l,fl 921 01 f, 3704.421816 
10402456,Test,4,4,c,4,4,6,e, 0,00001019, 3704.422162 
10402457, Test, 4, 4, 4, 4, 4, a, 9, 1,00001019, 3704.422507 
10402458,Test,b,b,b,b,b,e,d, 0,00000019, 3704.422853 
10402459 , Tes t, 8 , 8 , 8 , 8 , 8 , a, d , 1 , 00000019 , 3704.42  3198 
10402 460, Test, 3, 3, b, 3,3, b, 7, 1,00000019, 3704.423544 
10402461  .Test,  4,  4,  4, 4,  4,  4,  0,  QT0000001?23704. 42 3890 


10405  304, Count, f,7,f,f,f,a,4,l, ffffffff, 3705 . 89845 
10405 305, Test, 3,3, 3, 3, 3,6, d , 0, ed6ffee5, 3705.899313 
10405 306, Tes t, 1 , 1 , 1 , 1 , 1 , d , 4, 0,ec0feee5, 3705.899569 
10405 307, Test, a, a, a, a, a, c, d , 1 , ec05e265 , 3705 . 899825 
10405 308, Tes t, 1, 1, 1, 1, 1, 1, f , 1, e405e265 , 3705.900083 
10405 309, Test, S,8,8,8,8,9,f,0,e405e265,3705.900339 
10405 310, Test, 3,3, 3, 3,3,7, b , 1 , e005e265,  .3705. 900596 
10405 311, Test, 9, 9, 9, 9, 9, 5, 3,1, e005e02 5, 3705.900872 
10405 312, Tes t, c, 4, c,c,c,0,c, 0,00054020, 3705.901217 
10405 313. Test, 5,5, 5, 5. 5,6. f, 0,00054020. 3705.901563 
10405 314, Test, 7, f, 7, 7, 7, 6,0, 1C37000002QT3705 .901909 


Figure  36  -  Injected  Fault  Results 

The  circled  result  is  represented  in  HEX.  Each  HEX  number  represents  four  fault  locations.  The 
first  result  of  00000019  depicts  that  there  are  three  possible  fault  locations  the  algorithm  located. 
The  second  result  of  00000020  represents  the  exact  location  of  the  fault  uncovered. 

4.2. 1.1  Analysis 

These  results  prove  that  the  inverse  deductive  algorithm  for  fault  detection  and  diagnosis 
worked.  The  algorithm  correctly  detected  each  of  the  injected  fault  locations  individually  and 
properly  reduced  the  number  of  possible  fault  locations  to  the  best  possible  solution.  These 
faults  were  each  injected  one  at  a  time.  The  algorithm’s  goal  was  to  detect  single  errors  within  a 
circuit  but  also  takes  into  account  multiple  errors.  However,  it  is  not  able  to  fully  diagnose  the 
exact  location  of  multiple  errors.  The  algorithm  performs  a  set  of  test  vectors  and  post 
processing  analysis  was  done. 
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4.2.2  Gamma  Radiation  Results 


Four  separate  tests  were  conducted  at  the  OSUNRL.  The  first  test  used  a  Spartan  series 
chip  that  was  separated  from  the  baseboard.  The  maximum  dose  rate  of  69  krad(Si)/hr  was  set 
based  on  the  dose  rate  curve  found  in  Chapter  2.  No  errors  were  detected  for  the  first  fifteen 
hours,  but  soon  after,  the  chip  experienced  a  catastrophic  failure  and  no  values  were  obtained. 
The  chip  itself  was  damaged  beyond  repair  and  not  able  to  be  reprogrammed. 

In  an  effort  to  cause  errors,  the  next  three  tests  kept  the  baseboard  and  the  DUT  attached. 
The  first  test  lasted  for  forty-eight  minutes  before  it  was  removed  from  radiation.  The  result  log 
showed  a  sporadic  array  of  random  results  with  no  consistent  outcome.  All  the  adders  and  voter 
logic  produced  multiple  random  results.  There  were  no  clear,  traceable  set  of  results  obtained  in 
the  result  logs.  After  pulling  the  DUT  from  radiation,  the  baseboard  was  damaged  and 
inoperable.  However,  despite  the  damaged  baseboard,  the  DUT  remained  operational.  Figure  37 
shows  a  portion  of  the  log  of  the  errors  from  this  test.  The  full  results  are  similar  to  the  portion 


displayed  in  this  figure. 


5427461, Test, f , f , f , f , 3, 6, d, 0,  ffffffff ,  4763.5  31108 
5427462, Test, f , c, e , f , 7, 6, 0, 1, ffffffff , 4763. 531453 
5  42  746  3 , Tes  t , f , c, 0 , 1 , 3 , 7 , b , 1 , ffffffff ,4763.531799 
5  42  7464 , Tes  t , f , d , b , 1 , d , a, 2 , 1 , ffffffff  ,4763.532145 
5427465, Tes t, 3, 8,0,1, f,0,e,l, ffffffff, 4763. 532490 
5427466, Test, 2, 8, 0, 1, 1, 1, f , 1, ffffffff, 4763. 532836 
5  42  7467 , Tes  t , 2 , 0 , 0 , 0 , 9 , 5 , 3 , 1 , ffffffff  ,4763.533181 
^REPEATS** 

5427487, Test, 0, 0, 0, 0, 8, 9, f, 0, ffffffff, 4763.540395 
5  42  7488, Test, 0, 7, f, 8, 4, 8, b , 1 , ffffffff ,4763.540651 
5 42 7489, Test, 0, 7, f , 8, a, c, d, 1, ffffffff , 4763.540908 
5427490, Test, 0, 0, 0, 0, 8, b, c, 1, ffffffff , 4763.541168 
5 42 7491, Test, 0,7, f, b, 6, d , 0, ffffffff, 4763.541476 
5427492 , Test, 9, 7, f , f , 7, 6, 0, 1, ffffffff ,  4763.541821 
5  42  749  3 , Tes t , 9 , f , f , f , 3 , 7 , b , 1 , ffffffff ,4763.542167 
5427494, Test, d, f , f , f , d, a, 2 , 1, ffffffff , 4763.542512 
5427495,Test,d,f,f,f,f,0,e,l, ffffffff, 4763. 544415 
5 42 7496, Test, f,f,f,f,l,l,f,l, ffffffff, 4763. 544675 


Figure  37  -  Gamma  Radiation  Test  2,3  Results 


The  third  test  was  set  up  in  a  similar  fashion  as  the  second  test.  This  test  lasted  for  an 


hour  and  nineteen  minutes  before  numerous  errors  showed  up.  The  third  test  lasted  longer  but 
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the  results  within  the  log  were  the  same  as  Figure  36.  Aside  from  the  duration  of  this  test,  the 
only  other  difference  between  the  two  tests  was  Test  3’ s  baseboard  and  DUT  were  no  longer 
operational.  Test  T s  DUT  was  still  operational. 

The  final  test  for  gamma  radiation  was  met  with  differing  results.  The  device  was  raised 

to  seventeen  inches  from  the  bottom  of  the  radiation  chamber.  By  raising  the  distance,  the  dose 

rate  is  effectively  reduced  to  35  krad(Si)/hr  based  on  Figure  1 1  found  in  Chapter  2.  After  two 

and  three  quarter  hours,  the  baseboard  and  the  DUT  were  still  operational  after  being  pulled  from 

the  radiation  event.  There  were  a  total  of  thirty-four  errors  that  were  traceable  coming  in  three 

groups  of  time.  Figure  38  shows  the  timeline  for  all  four  tests  for  gamma  radiation  along  with 

the  group  of  errors  detected  in  Test  4. 

Test  2  Test  3  Test  4  Test  1 

Pulled  Pulled  Pulled  Pulled 

15  Time  (hr) 

Test  4  I  Test  4 

Errors  1-7  |  Errors  17-34 

Test  4 
Errors  7-17 

Figure  38  -  Gamma  Radiation  Timeline 

The  three  groups  of  errors,  based  on  the  logs,  were  consistent  and  traceable.  It  appeared 
that  the  voting  logic  produced  errors  at  bit  zero.  The  results  showed  that  bit  zero  of  the  voting 
logic  was  a  stuck-at-zero  fault  meaning  a  value  in  the  output  remained  zero  for  a  period  of  time. 
Table  3  presents  an  example  of  how  the  results  represent  a  bit  stuck  at  zero.  Finally,  Table  4 
shows  a  summary  of  all  four  test  results  taken  during  the  gamma  radiation  test. 
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Table  3  -  Stuck  at  Zero  Faults 


Behave  Adder 

CLA  Adder 

RC  Adder 

Voter 

Gold  Circuit 

D-  1101 

D-  1101 

D-  1101 

C-  1100 

D-  1101 

3-0011 

3-0011 

3-0011 

2-0010 

3-0011 

9-1001 

9-1001 

9-1001 

8  -  1000 

9-1001 

Table  4  -  Gamma  Radiation  Results  Summary 


Test  # 

Chip  Type 

Baseboard 

Attached 

Does  Rate 
(krad(Si)/hr) 

Duration 

(hrs) 

Baseboard 

Operational 

DUT 

Operational 

Vectors  Checked 
*  1000 

i 

Spartan  3 

No 

69 

15 

Yes 

No 

150861 

2 

FX  (1) 

Yes 

69 

0.8667 

No 

Yes 

7269 

3 

FX  (2) 

Yes 

69 

1.333 

No 

No 

15482 

4 

FX  (3) 

Yes 

35 

2.75 

Yes 

Yes 

26470 

4.2.2. 1  Analysis 

The  overall  results  from  the  gamma  radiation  test  showed  that  the  test  platform  created 
for  this  test  was  successful.  The  detection  portion  of  the  algorithm  of  determining  if  errors 
existed  proved  successful.  In  Test  1  the  catastrophic  failure  of  the  chip  rendered  the  results 
inconclusive.  However,  the  test  did  show  that  the  separated  DUT  with  its  smaller  technology  (90 
nm)  was  more  resilient  to  gamma  radiation.  By  having  smaller  technology,  the  gate  oxide  traps 
less  positive  charge  overall.  With  less  trapped  charges  due  to  radiation,  the  transistors  have  a 
better  change  of  operating  normally.  Test  2  and  3  confirmed  that  the  baseboard  is  more 
susceptible  to  gamma  radiation  than  the  DUT  itself  as  shown  in  previous  work  [17].  One  could 
speculate  that  the  cause  for  this  would  be  the  baseboard  was  not  designed  for  volatile 
environments.  The  designers  were  not  interested  in  creating  a  board  to  withstand  gamma 
radiation  rather  than  to  just  provide  a  connection  port  for  the  DUT. 

The  final  test  showed  that  the  algorithm  was  able  to  track  the  single  bit  error  in  the  voting 
logic.  However,  it  was  not  able  to  diagnose  the  error  because  it  would  disappear  on  average 
0.00069  seconds.  The  logs  were  able  to  track  thirty  four  errors  before  pulling  the  device  out  of 
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Table  5  -  Thermal  Radiation  Results  Summary 


Test  # 

Chip 

Type 

Duration 

(mins:secs) 

Start  Temp 

(C°) 

Peak  Temp 

(C°) 

Final  Output 
Wattage 

1 

FX(1) 

11:07 

34.8 

440.9 

1800 

2 

FX(2) 

13:06 

36.1 

392.7 

1350 

3 

FX(3) 

18:39 

42.8 

298.1 

1300 

4 

FX(4) 

21:35 

35.5 

301.8 

1290 

the  radiation.  The  results  show  that  the  number  of  errors  for  each  ‘grouping’  increased  over 
time.  This  would  be  expected  since  the  DUT  has  been  constantly  exposed  to  gamma  radiation. 
If  the  device  would  have  remained  in  radiation,  the  results  had  the  potential  to  show  that  more 
errors  would  occur  due  to  the  voltage  threshold  dropping  as  in  Figure  9  (Chapter  2)  while  in 
radiation. 

The  tests  performed  with  gamma  radiation  do  suggest  that  there  might  be  a  difference 
between  total  ionizing  dose  and  dose  rate.  The  two  tests  with  the  highest  dose  rate  failed  faster 
than  the  board  with  half  the  dose  rate  and  under  radiation  with  more  than  doubled  the  amount  of 
time  than  the  other  two  tests.  This  would  be  similar  to  skin  exposure  to  the  sun’s  harmful  UV 
rays.  On  a  bright  sunny  day  with  no  form  of  protection,  skin  has  a  higher  chance  of  being 
damaged  in  a  period  of  one  hour  than  a  cloudy  day  in  a  period  of  two  hours.  Skin  would  be 
exposed  to  similar  total  amounts  of  UV  rays  but  the  one  in  direct  sunlight  can  potentially  cause 
more  immediate  damage.  In  the  end,  the  total  dose  rate  does  have  an  effect  on  the  threshold 
voltage.  As  shown  in  Figure  4  (Chapter  2),  the  threshold  voltage  could  have  shifted  causing  the 
set  of  errors  from  Test  4  to  appear  and  then  disappear  in  groups.  More  tests  would  be  needed  to 
verify  this,  but  there  are  promising  results  from  the  outcome  of  this  test  setup. 

4.2.3  Thermal  Radiation  Results 

An  attempt  to  expose  the  DUT  to  thermal  (heat)  radiation  was  another  test  in  an  effort  to 

detect  errors.  Four  tests  were  conducted  on  four  separate  DUTs.  Table  5  shows  a  summary  of 
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the  results  of  all  four  tests  for  thermal  radiation.  The  duration  column  indicates  how  long  it  took 
before  complete  chip  failure.  The  first  two  tests  were  conducted  by  increasing  the  wattage 
quickly  over  a  shorter  period  of  time.  The  results  for  the  first  two  tests  had  catastrophic  failure  at 
higher  temperatures.  The  log  did  show  that  there  were  errors  in  the  voting  logic  at  bit  one  before 
completely  failing  all  together  with  all  outputs  being  zero.  Between  the  first  two  tests  there  were 
more  than  a  dozen  single  voter  errors  spanning  five  hundred  milliseconds  before  everything 
failed. 

The  last  two  tests  had  the  output  wattage  slowly  increase  over  a  longer  span  of  time. 

With  this  test,  the  chip  failed  around  300°C.  Though  the  chip  took  longer  to  fail  and  had  lower 
temperatures,  no  traceable  errors  were  able  to  be  detected.  The  FX  series  chip  was  only  rated  for 
temperatures  in  the  range  of  zero  to  eighty-five  degrees  Celsius.  In  the  end  due  to  this  fact,  the 
chip  catastrophically  failed  having  results  of  all  zeros  for  the  output. 

4.2.3. 1  Analysis 

The  results  for  this  test  again  showed  that  the  voter  logic  was  the  first  to  fail  before  any  of 
the  adders.  Due  to  the  law  of  thermal  equilibrium,  a  direct  pinpoint  form  of  thermal  radiation 
could  not  be  achieved.  The  entire  body  of  the  DUT  would  heat  up  to  roughly  the  same 
temperature  of  the  directed  pinpoint  of  the  beam.  In  semiconductors,  electrical  conductivity 
increases  with  increasing  temperature.  Silicon’s  thermal  conductivity  is  only  three  hundred 
Kelvin  which  is  equivalent  to  twenty-seven  degrees  Celsius.  One  could  speculate  that  with  the 
quick  ramp  up  of  wattage  thermal  equilibrium  was  not  fully  reached  allowing  the  chip  to  achieve 
higher  temperatures  before  failing.  The  high  temperatures  did  not  fully  diffuse  across  the  DUT 
possibly  causing  partial  damage.  Once  the  rest  of  the  chip  caught  up  with  the  focal  point  of  the 
beam,  the  entire  chip  completely  failed. 
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Table  6  -  Optical  Laser  Results  Summary 


Test  # 

Chip  Type 

Duration  (min) 

Laser  (mW) 

Errors 

1 

FX(1) 

45 

10 

None 

2 

FX(1) 

45 

20 

None 

3 

FX(1) 

45 

50 

None 

The  other  tests  had  the  increase  of  temperature  ramped  up  more  slowly.  These  tests  did 
not  have  any  constructive  results.  The  chip  failed  completely  by  having  a  short  created  due  to 
the  heat.  The  slow  method  of  increasing  the  temperature  allowed  for  thermal  equilibrium  across 
the  entire  chip  before  stepping  to  the  next  level.  By  waiting  longer,  the  entire  chip  would  have 
the  same  temperature  and  allow  it  to  fail  completely  without  damaging  a  portion  of  the  chip  first. 
In  the  end,  the  results  for  this  test  did  show  that  the  system  was  able  to  detect  some  single  errors 
before  completely  crashing. 

4.2.4  Optical  Laser  Results 

An  attempt  to  cause  errors  on  the  DUT  was  to  use  a  laser  pointer.  Three  tests  were 
conducted  with  each  of  the  three  lasers.  Table  6  depicts  the  results  gathered  from  the 
experiment.  The  lasers  were  focused  onto  the  same  comer  of  the  DUT  where  a  few  of  the 
outputs  would  have  been  located.  After  forty-five  minutes,  it  was  determined  that  the  test  would 
not  cause  any  errors. 

4.2.4. 1  Analysis 

After  careful  examination  of  the  DUT,  it  was  determined  another  metal  plate  was  there  to 
protect  the  innermost  logic  of  the  FPGA.  Without  permanently  damaging  the  chip  itself, 
attempts  through  chemicals  and  small  blades  to  remove  the  second  metal  plate  turned  up 
fruitless.  The  possibility  that  the  laser  could  create  enough  heat  to  cause  thermal  conductivity 
producing  errors  was  considered.  However,  since  the  lasers  were  COTS,  the  laser  was  not  able 
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Table  7  -  Optical  Flash  Results  Summary 


Test  # 

Chip  Type 

Flash  Output 

(W) 

Covering 

Results 

1 

FX(1) 

600  (Max) 

Yes 

None 

2 

FX(2) 

18.75(Min) 

No  Lid 

Crashed 

3 

FX(2) 

428.4 

No  Lid  / 
600um  pinhole 

Crashed 

to  last  long  enough  due  to  batteries.  It  was  concluded  that  the  laser  test  would  not  be  able  to 
induce  any  errors  on  the  DUT  with  the  second  metal  plate  in  place. 

4.2.5  Optical  Flash  Results 

This  was  the  last  test  that  was  performed  and  available.  The  goal  of  this  test  was  to  use  a 
flash  to  cause  Electrical  Magnetic  Interference  (EMI)  to  the  DUT.  Table  7  is  a  summary  of  the 
results  collected  from  each  test. 

The  first  test  used  a  brand  new  DUT  completely  intact.  The  DUT  was  completely 
resistant  to  the  output  of  the  flash  all  the  way  up  to  its  max,  600W.  No  errors  or  damage  to  the 
DUT  came  about  with  this  test.  The  second  test  used  a  DUT  that  had  the  top  plate  removed.  By 
removing  this  plate,  the  DUT  was  completely  susceptible  to  EMI.  With  the  lowest  setting, 
18.75W,  the  DUT  reset  its  values.  Even  a  camera  flash  was  able  to  reset  the  DUT.  However,  in 
this  test  the  DUT  was  able  to  be  reprogrammed  and  used  again.  The  final  test  used  the  600 
micrometer  pinhole  in  an  effort  to  direct  the  flash  onto  on  portion  of  the  DUT.  With  the  f-stop 
setting  of  4.2  (428.4W),  the  flash  device  was  able  to  reset  the  DUT.  Further  tests  were 
conducted  to  make  sure  this  value  was  the  case.  The  DUT  would  consistently  fail  at  the  f-stop 
setting  of  4.2  but  not  at  4.1. 

4.2.5. 1  Analysis 

It  is  easily  concluded  that  the  outer  plate  of  the  DUT  helped  protect  the  device  from  EMI 
and  other  potential  harmful  effects.  By  removing  this  plate,  it  exposed  the  device  and  made  it 
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more  susceptible  to  EMI.  The  simple  flash  of  a  camera  was  able  to  cause  enough  EMI  in  the 
circuit  to  cause  a  voltage  drop  throughout  the  entire  FPGA.  By  dropping  the  voltage,  all  the 
thresholds  were  lowered  and  the  resulting  outputs  from  the  FPGA  turned  into  l’s.  By  adding  a 
pinhole  plate  over  the  DUT,  it  was  able  to  shield  out  some  of  the  EMI  on  the  low  outputs. 
However,  once  it  reached  4.2,  the  DUT  became  subject  to  EMI  causing  it  to  fail.  Even  with  a 
pinhole,  the  EMI  was  not  directed  toward  one  portion  of  the  DUT.  The  entire  FPGA  failed. 

4.3  Summary 

This  chapter  discussed  all  the  results  obtained  from  the  five  tests  to  detect  and  diagnose 
errors  within  the  DUT.  Some  tests  were  slightly  successful.  In  short,  some  of  the  tests  were  able 
to  validate  the  detection  portion  of  the  algorithm.  The  injected  fault  test  was  the  only  test  to 
exercise  the  diagnosis  part  of  the  algorithm.  This  was  expected  since  the  entire  experiment  was 
run  through  hardware  and  was  completely  controlled  by  a  user.  It  was  disappointing  the 
remaining  tests  were  unable  to  fully  allow  for  diagnosing  single  errors  on  a  circuit.  The  gamma 
and  thermal  radiation  test  was  able  to  detect  single  errors  in  the  voter  logic,  but  were  not  able  to 
pinpoint  the  exact  location  of  the  fault.  The  gamma  radiation  test  had  its  errors  disappear  most 
likely  due  to  the  SEEs  only  creating  temporary  damage.  The  thermal  radiation  test  went  beyond 
the  thermal  conductivity  of  silicon  causing  the  DUT  to  fail  completely.  The  optical  laser  test 
was  not  able  to  produce  any  real  results  because  of  the  extra  metal  plating.  Finally,  the  optical 
flash  test  caused  enough  EMI  to  create  soft  errors  when  the  DUT  was  completely  exposed. 
Attempting  to  pinpoint  the  flash  did  not  help  with  any  of  the  results.  The  pinhole  in  the  end 
helped  protect  the  chip  from  the  EMI  of  the  flash  device.  All  in  all,  these  tests  do  prove  that 
circuits  are  still  susceptible  to  multiple  forms  of  radiation  which  provides  an  ample  amount  of 


55 


time  and  research  to  be  performed  to  help  mitigate  errors  within  a  circuit.  All  result  logs  can  be 
found  in  Appendix  I. 
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V.  Conclusion 


5.1  Chapter  Overview 

A  final  wrap  up  is  discussed  in  this  section  along  with  notes  and  ideas  for  future  work 
that  could  be  completed  to  enhance  this  particular  area  of  research.  The  conclusion  covers  the 
success  of  the  implementation  of  the  testing  platform  and  diagnostic  algorithm.  The  TDTMR 
conclusion  is  also  discussed  in  this  chapter. 

5.2  Conclusion 

The  first  step  done  for  this  thesis  was  correcting  the  multiple  mistakes  previously 
performed  in  this  area  described  in  Chapter  2.  Along  with  correcting  mistakes,  three  main  goals 
were  the  focus  of  this  research;  construct  a  test  platform  to  evaluate  circuit  designs  under  various 
forms  of  radiation,  develop  an  algorithm  to  detect  and  diagnose  errors  within  a  circuit,  and 
finally  study  a  new  design,  TDTMR. 

The  first  goal  of  constructing  a  test  platform  to  evaluate  a  circuit  while  under  radiation 
was  accomplished.  The  test  platform  was  fully  developed  to  allow  a  DUT  to  be  placed  under 
hazardous  conditions  while  the  control  board  performs  a  full  test  vector  evaluation.  The  test 
platform  was  also  corrected  and  enhanced  from  previous  designs  and  work  that  was  done  in  this 
area  of  research.  This  platform  was  used  in  all  five  test  setups  and  could  be  modified  to  adapt  to 
other  possible  test  setups.  The  entire  test  setup  replaces  numerous  machines  and  countless  man 
hours  to  setup  a  testing  environment. 

The  goal  of  developing  a  method  to  detect  errors  and  diagnose  the  location  in  real  time 
was  accomplished.  The  algorithm  showed  its  full  functionality  during  the  manual  injected  test. 
The  algorithm  accurately  detects  single  errors  and  was  capable  of  locating  them  in  the  DUT.  If 
multiple  errors  occurred  at  the  same  time,  the  algorithm  can  detect  the  incorrect  outputs,  but  not 


57 


fully  diagnose  its  location  due  to  the  scope  of  the  algorithm.  The  final  results  within  the  logs 
showed  that  multiple  errors  were  accounted  for  by  this  algorithm. 

Finally,  the  goal  of  fully  exercising  the  concept  TDTMR  was  not  fully  tested  with  the 
current  time  constraints.  However,  corrections  were  made  to  the  original  design  that  was 
performed  in  previous  research.  The  design  correctly  implements  the  TMR  concept  along  with 
having  the  design  fully  laid  out  on  the  FPGA  instead  of  being  optimized.  The  results  of  a  few 
test  setups  do,  however,  indicate  that  the  voting  logic  is  more  susceptible  to  radiation  damage 
than  the  three  forms  of  adding  logic.  Overall,  the  TDTMR  design  is  prepared  and  capable  of 
being  fully  evaluated  given  the  opportunity. 

5.3  Future  Work 

There  is  a  lot  of  opportunity  for  further  research  and  studies  within  this  area.  Radiation 
effects  on  electronics  continue  to  be  a  mystery.  There  are  a  lot  of  areas  to  investigate  further 
from  this  research. 

First,  the  TDTMR  design  should  be  more  thoroughly  investigated.  The  current  research 
in  this  thesis  does  not  fully  cover  all  the  possibilities  that  are  out  there.  TDTMR  has  the 
opportunity  to  possibly  be  extremely  effective.  Having  three  differing  forms  of  equal  logic  could 
demonstrate  that  one  form  of  logic  is  more  resilient  to  errors  than  another.  TDTMR  can  then 
potentially  be  added  to  the  hardening  by  design  techniques  if  fully  developed.  TDTMR  could  be 
taken  a  step  further  by  utilizing  Xilinx’s  floor  planner  tool  to  place  the  different  logic  blocks  in 
different  areas  of  the  FPGA. 

Additionally,  work  could  be  done  to  perfect  the  algorithm  to  take  into  account  of  multiple 
bit  errors.  The  algorithm  currently  is  only  able  to  detect  for  single  errors  but  research  and 
development  of  diagnosing  multiple  error  locations  could  be  expanded.  Further  research  for 
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fault  detection  and  diagnosis  in  sequential  logic  also  could  be  looked  into  more  significantly. 
Sequential  logic  is  in  more  demand  and  developing  an  algorithm  for  detecting  and  locating  faults 
would  be  significant. 

Finally,  the  test  platform  developed  in  this  research  can  be  modified  and  utilized  for  other 
forms  of  radiation.  The  basic  form  of  the  platform  is  stable  and  has  the  potential  to  be  used  to 
examine  other  forms  of  radiation.  Gamma  radiation  was  the  only  form  that  was  immediately 
available  during  this  research.  With  more  time,  this  platform  could  be  used  to  characterize  other 
forms  of  radiation. 

The  overall  framework  of  this  research  has  been  ongoing.  However,  the  research 
performed  for  this  thesis  provides  a  more  solid  foundation  to  take  the  concepts  of  testing  and 
evaluating  radiation  effects  on  electronics  to  the  next  level.  The  goals  of  this  research  provides  a 
better  direction  and  heading  for  further  studies  in  this  field. 
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Appendices 


Appendix  A  :  Virtex  4  FX  Series  Mini-Module  Datasheet 

Found  on  CD  >  Appendix/FX  12_man.pdf 

Appendix  B  :  Carry  Look  Ahead  Adder  Schematic 

Found  on  CD  >  Appendix/full_cla.jpg 

Appendix  C  :  Ripple  Carry  Adder  Schematic 

Found  on  CD  >  Appendix/full_rc.jpg 

Appendix  D  :  Virtex  II  Pro  Datasheet 

Found  on  CD  >  Appendix/V2P_man 

Appendix  E  :  High  Speed  CMOS  Hex  Buffer  Datasheet 

Found  on  CD  >  Appendix/buffer_data.pdf 

Appendix  F  :  IC  Scoket  with  Capacitor  Datasheet 

Found  on  CD  >  Appendix/IC_socket.pdf 

Appendix  G  :  TESTCAD  Tool  Guide 

Found  on  CD  >  Appendix/TESTCAD  Tool  Guide.pdf 

Appendix  H  :  Full  Fault  List 

Found  on  CD  >  Appendix/cla_final_FL.xls 

Found  on  CD  >  Appendix/rc_final_FL.xls 

Appendix  I :  Result  Logs 

Found  on  CD  >  Appendix/RESULTS/ 
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